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About This Book 


The primary objective of this user’s manual is to describe the functionality of the 
MPC750 RISC microprocessor family, which includes the MPC750, MPC755, MPC740 
and MPC745 microprocessors. Unless noted otherwise, descriptions in this manual that 
refer to MPC750 apply to all members of the MPC750 family. 





This book is intended as a companion to the Programming Environments Manual for 32-Bit 
Implementations of the PowerPC Architecture (referred to as the Programming 
Environments Manual). 


NOTE: About the Companion Programming Environments Manual 


The MPC750 RISC Microprocessor User’s Manual, which 
describes MPC750 features not defined by the architecture, is 
to be used with the Programming Environments Manual. 


Because the PowerPC architecture definition is flexible to 
support a broad range of processors, The Programming 
Environments Manual describes generally those features 
common to these processors and indicates which features are 
optional or may be implemented differently in the design of 
each processor. 


Note that the Programming Environments Manual describes 
features of the PowerPC architecture only for 32-bit 
implementations. 


Contact your sales representative for a copy of the 
Programming Environments Manual. 


This document and the Programming Environments Manual distinguish between the three 
levels, or programming environments, of the PowerPC architecture, which are as follows: 


¢ PowerPC user instruction set architecture (UISA)—The UISA defines the level of 
the architecture to which user-level software should conform. The UISA defines the 
base user-level instruction set, user-level registers, data types, memory conventions, 
and the memory and programming models seen by application programmers. 


¢ PowerPC virtual environment architecture (VEA)—The VEA, which is the smallest 
component of the PowerPC architecture, defines additional user-level functionality 
that falls outside typical user-level software requirements. The VEA describes the 
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memory model for an environment in which multiple processors or other devices can 
access external memory and defines aspects of the cache model and cache control 
instructions from a user-level perspective. VEA resources are particularly useful for 
optimizing memory accesses and for managing resources in an environment in 
which other processors and other devices can access external memory. 


Implementations that conform to the VEA also conform to the UISA but may not 
necessarily adhere to the OEA. 


¢ PowerPC operating environment architecture (OEA)—The OEA defines 
supervisor-level resources typically required by an operating system. It defines the 
memory management model, supervisor-level registers, and the exception model. 


Implementations that conform to the OEA also conform to the UISA and VEA. 


Note that some resources are defined more generally at one level in the architecture and 
more specifically at another. For example, conditions that cause a floating-point exception 
are defined by the UISA, but the exception mechanism itself is defined by the OBA. 


Because it is important to distinguish between the levels of the architecture to ensure 
compatibility across multiple platforms, those distinctions are shown clearly throughout 
this book. 


For ease in reference, topics in this book are presented in the same order as the 
Programming Environments Manual. Topics build upon one another, beginning with a 
description and complete summary of the MPC750 programming model (registers and 
instructions) and progressing to more specific, architecture-based topics regarding the 
cache, exception, and memory management models. As such, chapters may include 
information from multiple levels of the architecture. For example, the discussion of the 
cache model uses information from both the VEA and the OEA. 


The PowerPC Architecture: A Specification for a New Family of RISC Processors defines 
the architecture from the perspective of the three programming environments and remains 
the defining document for the PowerPC architecture. For information about ordering 
Freescale documentation, see “Suggested Reading,” on page xxxiv. 


Information in this book is subject to change without notice, as described in the disclaimers 
on the title page of this book. As with any technical documentation, it is the readers’ 
responsibility to be sure they are using the most recent version of the documentation. 


For updates to this document, refer to http://www.freescale.com. 


Audience 


This manual is intended for system software and hardware developers and applications 
programmers who want to develop products for the MPC750. It is assumed that the reader 
understands operating systems, microprocessor system design, basic principles of RISC 
processing, and details of the PowerPC architecture. 
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Organization 


Following is a summary and a brief description of the major sections of this manual: 


Chapter 1, “Overview,” is useful for readers who want a general understanding of 
the features and functions of the PowerPC architecture and the MPC750. This 
chapter describes the flexible nature of the PowerPC architecture definition, and 
provides an overview of how the PowerPC architecture defines the register set, 
operand conventions, addressing modes, instruction set, cache model, exception 
model, and memory management model. 


Chapter 2, “Programming Model,’is useful for software engineers who need to 
understand the MPC750-specific registers, operand conventions, and details 
regarding how PowerPC instructions are implemented on the MPC750. Instructions 
are organized by function. 


Chapter 3, “L1 Instruction and Data Cache Operation,” discusses the cache and 
memory model as implemented on the MPC750. 


Chapter 4, “Exceptions,” describes the exception model defined in the PowerPC 
OEA and the specific exception model implemented on the MPC750. 


Chapter 5, “Memory Management,” describes the MPC750’s implementation of the 
memory management unit specifications provided by the OEA. 


Chapter 6, “Instruction Timing,” provides information about latencies, interlocks, 
special situations, and various conditions to help make programming more efficient. 
This chapter is of special interest to software engineers and system designers. 


Chapter 7, “Signal Descriptions,” describes signals of the MPC750. 


Chapter 8, “System Interface Operation,” describes signal timings for various 
operations. It also provides information for interfacing to the MPC750. 


Chapter 9, ““L2 Cache Interface Operation,” describes the use of the MPC750 L2 
cache and cache controller. Note that this feature is not supported on the MPC740 
or the MPC745. 


Chapter 10, “Power and Thermal Management,” provides information about power 
saving and thermal management modes for the MPC750 family. 


Chapter 11, “Performance Monitor,’ describes the operation of the performance 
monitor diagnostic tool incorporated in the MPC750 family. 


Appendix A, “PowerPC Instruction Set Listings,” lists PowerPC instructions, 
indicating those that are not implemented by the MPC750; it also includes those that 
are specific to the MPC750. Separate tables are provided, listing the instructions by 
mnemonic, opcode, function, and form. A quick reference table contains general 
information for each instruction, such as the architecture level, privilege level, and 
form, and indicates if the instruction is 64-bit and optional. 


Appendix B, “Instructions Not Implemented,” provides a list of the 32-bit and 64-bit 
PowerPC instructions that are not implemented in the MPC750. 
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e Appendix C, “MPC755 Embedded G3 Microprocessor,” describes the differences 
between the MPC750 and the MPC755. The appendix also serves to identify any 
differences between the MPC740 and the MPC745. 


¢ Appendix D, “User’s Manual Revision History,” provides a revision history for this 
book, and identifies all the major changes that were made between Revision 0 of this 
book and Revision 1. 


e This manual also includes a glossary and an index. 


Suggested Reading 


This section lists additional reading that provides background for the information in this 
manual as well as general information about the PowerPC architecture. 


General Information 


The following documentation, available through Morgan-Kaufmann Publishers, 340 Pine 
Street, Sixth Floor, San Francisco, CA, provides useful information about the PowerPC 
architecture and computer architecture in general: 


¢ The PowerPC Architecture: A Specification for a New Family of RISC Processors, 
Second Edition, by International Business Machines, Inc. 
For updates to the specification, see http://www.austin.ibm.com/tech/ppc-chg. html. 


¢ PowerPC Microprocessor Common Hardware Reference Platform: A System 
Architecture, by Apple Computer, Inc., International Business Machines, Inc., and 
Freescale Semiconductor, Inc. 

¢ Computer Architecture: A Quantitative Approach, Second Edition, by 
John L. Hennessy and David A. Patterson 

¢ Computer Organization and Design: The Hardware/Software Interface, Second 
Edition, David A. Patterson and John L. Hennessy 


Related Documentation 


Freescale documentation is available from the sources listed on the back cover of this 
manual; the document order numbers are included in parentheses for ease in ordering: 


¢ Programming Environments Manual for 32-Bit Implementations of the PowerPC 
Architecture (MPEFPC32B/AD)—Describes resources defined by the PowerPC 
architecture. 


¢ User’s manuals—These books provide details about individual implementations and 
are intended for use with the Programming Environments Manual. 

e Addenda/errata to user’s manuals—Because some processors have follow-on parts 
an addendum is provided that describes the additional features and functionality 
changes. These addenda are intended for use with the corresponding user’s manuals. 
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e Hardware specifications—Hardware specifications provide specific data regarding 
bus timing, signal behavior, and AC, DC, and thermal characteristics, as well as 
other design considerations. Separate hardware specifications are provided for each 
part described in this book. 


¢ Technical summaries—Each device has a technical summary that provides an 
overview of its features. This document is roughly the equivalent to the overview 
(Chapter 1) of an implementation’s user’s manual. 


¢ The Programmer’s Reference Guide for the PowerPC Architecture: 
MPCPRG/D—This concise reference includes the register summary, memory 
control model, exception vectors, and the PowerPC instruction set. 


¢ The Programmer’s Pocket Reference Guide for the PowerPC Architecture: 
MPCPRGREF/D—This foldout card provides an overview of PowerPC registers, 
instructions, and exceptions for 32-bit implementations. 


¢ Application notes—These short documents address specific design issues useful to 
programmers and engineers working with Freescale processors. 


Additional literature is published as new processors become available. For a current list of 
documentation, refer to http://www.freescale.com. 


Conventions 

This document uses the following notational conventions: 

cleared/set When a bit takes the value zero, it is said to be cleared; when it takes 
a value of one, it is said to be set. 

mnemonics Instruction mnemonics are shown in lowercase bold. 

italics Italics indicate variable command parameters, for example, bectrx. 
Book titles in text are set in italics 


Internal signals are set in italics, for example, qual BG 


0x0 Prefix to denote hexadecimal number 

Ob0 Prefix to denote binary number 

rA, rB Instruction syntax used to identify a source GPR 

rD Instruction syntax used to identify a destination GPR 

frA, frB, frC Instruction syntax used to identify a source FPR 

frD Instruction syntax used to identify a destination FPR 

REG[FIELD] Abbreviations for registers are shown in uppercase text. Specific bits, 


fields, or ranges appear in brackets. For example, MSR[LE] refers to 
the little-endian mode enable bit in the machine state register. 
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x In some contexts, such as signal encodings, an unitalicized x 
indicates a don’t care. 

x An italicized x indicates an alphanumeric variable. 

n An italicized n indicates an numeric variable. 

a NOT logical operator 

& AND logical operator 

| OR logical operator 

Indicates reserved bits or bit fields in a register. Although these bits 
can be written to as ones or zeros, they are always read as zeros. 


Acronyms and Abbreviations 


Table i contains acronyms and abbreviations that are used in this document. 


Table i. Acronyms and Abbreviated Terms 


os 


CR Condition register 


CQ Completion queue 


feongenanwe 
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Table i. Acronyms and Abbreviated Terms (continued) 


es 
z 
: 
E 
5 
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Table i. Acronyms and Abbreviated Terms (continued) 


| term fering 


msb 
MSR 
NaN 
No-op 
OEA 


PLL 
PLRU 
PMCn 


POWER 
PTE 
PTEG 
PVR 
RAW 
RISC 
RTL 
RWITM 
RWNITM 
SDA 
SDR1 
SIA 
SPR 
SRn 
SRU 
SRRO 
SRR1 
SRU 
TAU 

TB 
TBL 
TBU 


THRMn 


TLB 


Fenster 


Translation lookaside buffer 
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Table i. Acronyms and Abbreviated Terms (continued) 


i 
aa 


XER Register used for indicating conditions such as carries and over o ws for integer operations 





Terminology Conventions 


Table ii describes terminology conventions used in this manual and the equivalent 
terminology used in the PowerPC architecture specification. 


Table ii. Terminology Conventions 


The Architecture Specification 
Data storage interrupt (DSI) 
Extended mnemonics 
Fixed-point unit (FXU) 
Instruction storage interrupt (ISI) 
Privileged mode (or privileged state) 


Problem mode (or problem state) 





Table iii describes instruction field notation used in this manual. 
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Table iii. Instruction Field Conventions 





The Architecture Specification 


Equivalent to: 

















BA, BB, BT crbA, crbB, crbD (respectively) 
BF, BFA crfD, crfS (respectively) 

D d 

DS ds 

FLM FM 





FRA, FRB, FRC, FRT, FRS 


frA, frB, frC, frD, frS (respectively) 























FXM CRM 

RA, RB, RT, RS rA, rB, rD, rS (respectively) 
SI SIMM 

U IMM 

Ul UIMM 

It, Ml 0...0 (shaded) 
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Chapter 1 
Overview 


This chapter provides an overview of the MPC750 microprocessor features, including a 
block diagram showing the major functional components. It provides information about 
how the MPC750 implementation complies with the PowerPC architecture definition. 


Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions 
for the MPC750 apply for the MPC755 except as noted in Appendix C, “MPC755 
Embedded G3 Microprocessor.” 


1.1 MPC750 Microprocessor Overview 


This section describes the features and general operation of the MPC750 and provides a 
block diagram showing major functional units. The MPC750 is a reduced instruction set 
computer (RISC) CPU, which implements the PowerPC architecture. The MPC750 
implements the 32-bit portion of the PowerPC architecture, which provides 32-bit effective 
addresses, integer data types of 8, 16, and 32 bits, and floating-point data types of 32 and 
64 bits. The MPC750 is a superscalar processor that can complete two instructions 
simultaneously. It incorporates the following six execution units: 


¢ Floating-point unit (FPU) 

¢ Branch processing unit (BPU) 
e System register unit (SRU) 

¢ Load/store unit (LSU) 


¢ Two integer units (IUs): IU1 executes all integer instructions. [U2 executes all 
integer instructions except multiply and divide instructions. 


The ability to execute several instructions in parallel and the use of simple instructions with 
rapid execution times yield high efficiency and throughput for MPC750-based systems. 
Most integer instructions execute in one clock cycle. The FPU is pipelined, the tasks it 
performs are broken into subtasks, implemented as three successive stages. Typically, a 
floating-point instruction can occupy only one of the three stages at a time, freeing the 
previous stage to work on the next floating-point instruction. Thus, three single-precision 
floating-point instructions can be in the FPU execute stage at a time. Double-precision add 
instructions have a three-cycle latency; double-precision multiply and multiply-add 
instructions have a four-cycle latency. 
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Figure 1-1 shows the parallel organization of the execution units (shaded in the diagram). 
The instruction unit fetches, dispatches, and predicts branch instructions. Note that this is 
a conceptual model that shows basic features rather than attempting to show how features 
are implemented physically. 


The MPC750 has independent on-chip, 32-Kbyte, eight-way set-associative, physically 
addressed caches for instructions and data and independent instruction and data memory 
management units (MMUs). Each MMU has a 128-entry, two-way set-associative 
translation lookaside buffer (DTLB and ITLB) that saves recently used page address 
translations. Block address translation is done through the four-entry instruction and data 
block address translation (IBAT and DBAT) arrays, defined by the PowerPC architecture. 
During block translation, effective addresses are compared simultaneously with all four 
BAT entries. For information about the L1 cache, see Chapter 3, “L1 Instruction and Data 
Cache Operation.” 


The L2 cache is implemented with an on-chip, two-way, set-associative tag memory, and 
with external, synchronous SRAMs for data storage. The external SRAMs are accessed 
through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of 
synchronous SRAMs. The L2 cache interface is not implemented in the MPC740. For 
information about the L2 cache implementation, see Chapter 9, “L2 Cache Interface 
Operation.” 


The MPC750 has a 32-bit address bus and a 64-bit data bus. Multiple devices compete for 
system resources through a central external arbiter. The MPC750’s three-state 
cache-coherency protocol (MEI) supports the exclusive, modified, and invalid states, a 
compatible subset of the MESI (modified/exclusive/shared/invalid) four-state protocol, and 
it operates coherently in systems with four-state caches. The MPC750 supports single-beat 
and burst data transfers for memory accesses and memory-mapped I/O operations. The 
system interface is described in Chapter 7, “Signal Descriptions,’ and Chapter 8, “System 
Interface Operation.” 


The MPC750 has four software-controllable power-saving modes. Three static modes, 
doze, nap, and sleep, progressively reduce power dissipation. When functional units are 
idle, a dynamic power management mode causes those units to enter a low-power mode 
automatically without affecting operational performance, software execution, or external 
hardware. The MPC750 also provides a thermal assist unit (TAU) and a way to reduce the 
instruction fetch rate for limiting power dissipation. Power management is described in 
Chapter 10, “Power and Thermal Management.” 


The MPC750 uses an advanced CMOS process technology and is fully compatible with 
TTL devices. 
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Figure 1-1. MPC750 Microprocessor Block Diagram 
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1.2 MPC750 Microprocessor Features 


This section lists features of the MPC750. The interrelationship of these features is shown 
in Figure 1-1. 


1.2.1. Overview of the MPC750 Microprocessor Features 
Major features of the MPC750 are as follows: 


¢ High-performance, superscalar microprocessor 


— As many as four instructions can be fetched from the instruction cache per clock 
cycle 


— As many as two instructions can be dispatched per clock 


— As many as six instructions can execute per clock (including two integer 
instructions) 


— Single-clock-cycle execution for most instructions 


e Six independent execution units and two register files 


— BPU featuring both static and dynamic branch prediction 


64-entry (16-set, four-way set-associative) branch target instruction cache 
(BTIC), a cache of branch instructions that have been encountered in 
branch/loop code sequences. If a target instruction is in the BTIC, it is fetched 
into the instruction queue a cycle sooner than it can be made available from 
the instruction cache. Typically, if a fetch access hits the BTIC, it provides the 
first two instructions in the target stream. 


512-entry branch history table (BHT) with two bits per entry for four levels of 
prediction—not-taken, strongly not-taken, taken, strongly taken 


Branch instructions that do not update the count register (CTR) or link register 
(LR) are removed from the instruction stream. 


— Two integer units (IUs) that share thirty-two GPRs for integer operands 


IU1 can execute any integer instruction. 


TU2 can execute all integer instructions except multiply and divide 
instructions (shift, rotate, arithmetic, and logical instructions). Most 
instructions that execute in the [U2 take one cycle to execute. The [U2 has a 
single-entry reservation station. 


— Three-stage FPU 


Fully IEEE 754-1985-compliant FPU for both single- and double-precision 
operations 


Supports non-IEEE mode for time-critical operations 
Hardware support for denormalized numbers 


Single-entry reservation station 
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Thirty-two 64-bit FPRs for single- or double-precision operands 


— Two-stage LSU 


Two-entry reservation station 

Single-cycle, pipelined cache access 

Dedicated adder performs EA calculations 

Performs alignment and precision conversion for floating-point data 
Performs alignment and sign extension for integer data 

Three-entry store queue 

Supports both big- and little-endian modes 


— SRU handles miscellaneous instructions 


Executes CR logical and Move to/Move from SPR instructions (mtspr and 
mfspr) 


Single-entry reservation station 


¢ Rename buffers 
— Six GPR rename buffers 
— Six FPR rename buffers 


— Condition register buffering supports two CR writes per clock 


¢ Completion unit 


— The completion unit retires an instruction from the six-entry reorder buffer 
(completion queue) when all instructions ahead of it have been completed, the 
instruction has finished execution, and no exceptions are pending. 


— Guarantees sequential programming model (precise exception model) 


— Monitors all dispatched instructions and retires them in order 


— Tracks unresolved branches and flushes instructions from the mispredicted 
branch 


— Retires as many as two instructions per clock 


e Separate on-chip instruction and data caches (Harvard architecture) 


— 32-Kbyte, eight-way set-associative instruction and data caches 


— Pseudo least-recently-used (PLRU) replacement algorithm 
— 32-byte (eight-word) cache block 


— Physically indexed/physical tags. (Note that the PowerPC architecture refers to 
physical address space as real address space.) 


— Cache write-back or write-through operation programmable on a per-page or 
per-block basis 


— Instruction cache can provide four instructions per clock; data cache can provide 
two words per clock 
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— Caches can be disabled in software 
— Caches can be locked in software 
— Data cache coherency (MEJ) maintained in hardware 


— The critical double word is made available to the requesting unit when it is burst 
into the line-fill buffer. The cache is nonblocking, so it can be accessed during 
this operation. 


¢ Level 2 (L2) cache interface (The L2 cache interface is not supported in the 
MPC740.) 


— On-chip two-way set-associative L2 cache controller and tags 

— External data SRAMs 

— Support for 256-Kbyte, 512-Kbyte, and 1-Mbyte L2 caches 

— 64-byte (256-Kbyte/512-Kbyte) and 128-byte (1 Mbyte) sectored line size 


— Supports flow-through (register-buffer), pipelined (register-register), and 
pipelined late-write (register-register) synchronous burst SRAMs 


e Separate memory management units (MMUs) for instructions and data 
— 52-bit virtual address; 32-bit physical address 


— Address translation for 4-Kbyte pages, variable-sized blocks, and 256-Mbyte 
segments 


— Memory programmable as write-back/write-through, cacheable/noncacheable, 
and coherency enforced/coherency not enforced on a page or block basis 


— Separate IBATs and DBATs (four each) also defined as SPRs 
— Separate instruction and data translation lookaside buffers (TLBs) 


— Both TLBs are 128-entry, two-way set associative, and use LRU replacement 
algorithm 


— TLBs are hardware-reloadable (that is, the page table search is performed in 
hardware) 


e Separate bus interface units for system memory and for the L2 cache 
— Bus interface features include the following: 


— Selectable bus-to-core clock frequency ratios of 2x, 2.5x, 3x, 3.5x, 4x, 4.5x ... 
8x. (2x to 8x, all half-clock multipliers in-between) 


— A 64-bit, split-transaction external data bus with burst transfers 

— Support for address pipelining and limited out-of-order bus transactions 
— Single-entry load queue 

— Single-entry instruction fetch queue 

— Two-entry L1 cache castout queue 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
MPC750 Microprocessor Features 


— No-DRTRY mode eliminates the DRTRY signal from the qualified bus grant. 
This allows the forwarding of data during load operations to the internal core 
one bus cycle sooner than if the use of DRTRY is enabled. 





— L2cache interface features (which are not implemented on the MPC740) include 
the following: 


— Core-to-L2 frequency divisors of 1, 1.5, 2, 2.5, and 3 
— Four-entry L2 cache castout queue in L2 cache BIU 
— 17-bit address bus 
— 64-bit data bus 
Multiprocessing support features include the following: 
— Hardware-enforced, three-state cache coherency protocol (MEI) for data cache. 


— Load/store with reservation instruction pair for atomic memory references, 
semaphores, and other multiprocessor operations 
Power and thermal management 
— Three static modes, doze, nap, and sleep, progressively reduce power 
dissipation: 
— Doze—All the functional units are disabled except for the time 
base/decrementer registers and the bus snooping logic. 
— Nap—tThe nap mode further reduces power consumption by disabling bus 
snooping, leaving only the time base register and the PLL in a powered state. 
— Sleep—All internal functional units are disabled, after which external system 
logic may disable the PLL and SYSCLK. 


— Thermal management facility provides software-controllable thermal 
management. Thermal management is performed through the use of three 
supervisor-level registers and an MPC750-specific thermal management 
exception. 


— Instruction cache throttling provides control of instruction fetching to limit 
power consumption. 


Performance monitor can be used to help debug system designs and improve 
software efficiency. 


In-system testability and debugging features through JTAG boundary-scan 
capability 


Instruction Flow 


As shown in Figure 1-1, the MPC750 instruction unit provides centralized control of 
instruction flow to the execution units. The instruction unit contains a sequential fetcher, 
six-entry instruction queue (IQ), dispatch unit, and BPU. It determines the address of the 
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next instruction to be fetched based on information from the sequential fetcher and from 
the BPU. 


See Chapter 6, “Instruction Timing,” for a detailed discussion of instruction timing. 


The sequential fetcher loads instructions from the instruction cache into the instruction 
queue. The BPU extracts branch instructions from the sequential fetcher. Branch 
instructions that cannot be resolved immediately are predicted using either the 
MPC750-specific dynamic branch prediction or the architecture-defined static branch 
prediction. 


Branch instructions that do not affect the LR or CTR are removed from the instruction 
stream. The BPU folds branch instructions when a branch is taken (or predicted as taken); 
branch instructions that are not taken, or predicted as not taken, are removed from the 
instruction stream through the dispatch mechanism. 


Instructions issued beyond a predicted branch do not complete execution until the branch 
is resolved, preserving the programming model of sequential execution. If branch 
prediction is incorrect, the instruction unit flushes all predicted path instructions, and 
instructions are fetched from the correct path. 


1.2.2.1. Instruction Queue and Dispatch Unit 


The instruction queue (IQ), shown in Figure 1-1, holds as many as six instructions and 
loads up to four instructions from the instruction cache during a single processor clock 
cycle. The instruction fetcher continuously attempts to load as many instructions as there 
were vacancies in the IQ in the previous clock cycle. All instructions except branch 
instructions are dispatched to their respective execution units from the bottom two positions 
in the instruction queue (IQO and IQ1) at a maximum rate of two instructions per cycle. 
Reservation stations are provided for the IU1, 1U2, FPU, LSU, and SRU. The dispatch unit 
checks for source and destination register dependencies, determines whether a position is 
available in the completion queue, and inhibits subsequent instruction dispatching as 
required. 


Branch instructions can be detected, decoded, and predicted from anywhere in the 
instruction queue. For a more detailed discussion of instruction dispatch, see Section 6.3.3, 
“Instruction Dispatch and Completion Considerations.” 


1.2.2.2 Branch Processing Unit (BPU) 


The BPU receives branch instructions from the sequential fetcher and performs CR 
lookahead operations on conditional branches to resolve them early, achieving the effect of 
a zero-cycle branch in many cases. 


Unconditional branch instructions and conditional branch instructions in which the 
condition is known can be resolved immediately. For unresolved conditional branch 
instructions, the branch path is predicted using either the architecture-defined static branch 
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prediction or the MPC750-specific dynamic branch prediction. Dynamic branch prediction 
is enabled if HIDO[BHT] = 1. 


When a prediction is made, instruction fetching, dispatching, and execution continue from 
the predicted path, but instructions cannot complete and write back results to architected 
registers until the prediction is determined to be correct (resolved). When a prediction is 
incorrect, the instructions from the incorrect path are flushed from the processor and 
processing begins from the correct path. The MPC750 allows a second branch instruction 
to be predicted; instructions from the second predicted instruction stream can be fetched 
but cannot be dispatched. 


Dynamic prediction is implemented using a 512-entry branch history table (BHT), a cache 
that provides two bits per entry that together indicate four levels of prediction for a branch 
instruction—not-taken, strongly not-taken, taken, strongly taken. When dynamic branch 
prediction is disabled, the BPU uses a bit in the instruction encoding to predict the direction 
of the conditional branch. Therefore, when an unresolved conditional branch instruction is 
encountered, the MPC750 executes instructions from the predicted target stream although 
the results are not committed to architected registers until the conditional branch is 
resolved. This execution can continue until a second unresolved branch instruction is 
encountered. 


When a branch is taken (or predicted as taken), the instructions from the untaken path must 
be flushed and the target instruction stream must be fetched into the IQ. The BTIC is a 
64-entry cache that contains the most recently used branch target instructions, typically in 
pairs. When an instruction fetch hits in the BTIC, the instructions arrive in the instruction 
queue in the next clock cycle, a clock cycle sooner than they would arrive from the 
instruction cache. Additional instructions arrive from the instruction cache in the next clock 
cycle. The BTIC reduces the number of missed opportunities to dispatch instructions and 
gives the processor a one-cycle head start on processing the target stream. 


The BPU contains an adder to compute branch target addresses and three user-control 
registers—the link register (LR), the count register (CTR), and the CR. The BPU calculates 
the return pointer for subroutine calls and saves it into the LR for certain types of branch 
instructions. The LR also contains the branch target address for the Branch Conditional to 
Link Register (belrx) instruction. The CTR contains the branch target address for the 
Branch Conditional to Count Register (bectrx) instruction. Because the LR and CTR are 
SPRs, their contents can be copied to or from any GPR. Because the BPU uses dedicated 
registers rather than GPRs or FPRs, execution of branch instructions is largely independent 
from execution of integer and floating-point instructions. 


1.2.2.3 Completion Unit 


The completion unit operates closely with the instruction unit. Instructions are fetched and 
dispatched in program order. At the point of dispatch, the program order is maintained by 
assigning each dispatched instruction a successive entry in the six-entry completion queue. 
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The completion unit tracks instructions from dispatch through execution and retires them 
in program order from the two bottom entries in the completion queue (CQO and CQ1). 


Instructions cannot be dispatched to an execution unit unless there is a vacancy in the 
completion queue. Branch instructions that do not update the CTR or LR are removed from 
the instruction stream and do not take an entry in the completion queue. Instructions that 
update the CTR and LR follow the same dispatch and completion procedures as non-branch 
instructions, except that they are not issued to an execution unit. 


Completing an instruction commits execution results to architected registers (GPRs, FPRs, 
LR, and CTR). In-order completion ensures the correct architectural state when the 
MPC750 must recover from a mispredicted branch or any exception. Retiring an instruction 
removes it from the completion queue. 


For a more detailed discussion of instruction completion, see Section 6.3.3, “Instruction 
Dispatch and Completion Considerations.” 


1.2.2.4 Independent Execution Units 


In addition to the BPU, the MPC750 provides the five execution units described in the 
following sections. 


1.2.2.4.1 Integer Units (IUs) 


The integer units [U1 and [U2 are shown in Figure 1-1. The [U1 can execute any integer 
instruction; the [U2 can execute any integer instruction except multiplication and division 
instructions. Each IU has a single-entry reservation station that can receive instructions 
from the dispatch unit and operands from the GPRs or the rename buffers. 


Each IU consists of three single-cycle subunits—a fast adder/comparator, a subunit for 
logical operations, and a subunit for performing rotates, shifts, and count-leading-zero 
operations. These subunits handle all one-cycle arithmetic instructions; only one subunit 
can execute an instruction at a time. 


The IU1 has a 32-bit integer multiplier/divider as well as the adder, shift, and logical units 
of the [U2. The multiplier supports early exit for operations that do not require full 32- x 
32-bit multiplication. 


Each IU has a dedicated result bus (not shown in Figure 1-1) that connects to rename 
buffers. 


1.2.2.4.2 Floating-Point Unit (FPU) 


The FPU, shown in Figure 1-1, is designed such that single-precision operations require 
only a single pass, with a latency of three cycles. As instructions are dispatched to the FPU’s 
reservation station, source operand data can be accessed from the FPRs or from the FPR 
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rename buffers. Results in turn are written to the rename buffers and are made available to 
subsequent instructions. Instructions pass through the reservation station in dispatch order. 


The FPU contains a single-precision multiply-add array and the floating-point status and 
control register (FPSCR). The multiply-add array allows the MPC750 to efficiently 
implement multiply and multiply-add operations. The FPU is pipelined so that one single- 
or double-precision instruction can be issued per clock cycle. Thirty-two 64-bit 
floating-point registers are provided to support floating-point operations. Stalls due to 
contention for FPRs are minimized by automatic allocation of the six floating-point rename 
registers. The MPC750 writes the contents of the rename registers to the appropriate FPR 
when floating-point instructions are retired by the completion unit. 


The MPC750 supports all IEEE 754 floating-point data types (normalized, denormalized, 
NaN, zero, and infinity) in hardware, eliminating the latency incurred by software 
exception routines. (Note that exception is also referred to as interrupt in the architecture 
specification. ) 


1.2.2.4.3. Load/Store Unit (LSU) 


The LSU executes all load and store instructions and provides the data transfer interface 
between the GPRs, FPRs, and the cache/memory subsystem. The LSU calculates effective 
addresses, performs data alignment, and provides sequencing for load/store string and 
multiple instructions. 


Load and store instructions are issued and translated in program order; however, some 
memory accesses can occur out of order. Synchronizing instructions can be used to enforce 
strict ordering. When there are no data dependencies and the guarded bit for the page or 
block is cleared, a maximum of one out-of-order cacheable load operation can execute per 
cycle, with a two-cycle total latency on a cache hit. Data returned from the cache is held in 
a rename register until the completion logic commits the value to a GPR or FPR. Stores 
cannot be executed out of order and are held in the store queue until the completion logic 
signals that the store operation is to be completed to memory. The MPC750 executes store 
instructions with a maximum throughput of one per cycle and a three-cycle total latency to 
the data cache. The time required to perform the actual load or store operation depends on 
the processor/bus clock ratio and whether the operation involves the on-chip cache, the L2 
cache, system memory, or an I/O device. 


1.2.2.4.4 System Register Unit (SRU) 


The SRU executes various system-level instructions, as well as condition register logical 
operations and move to/from special-purpose register instructions. To maintain system 
state, most instructions executed by the SRU are execution-serialized; that is, the 
instruction is held for execution in the SRU until all previously issued instructions have 
executed. Results from execution-serialized instructions executed by the SRU are not 
available or forwarded for subsequent instructions until the instruction completes. 
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1.2.3 Memory Management Units (MMUs) 


The MPC750’s MMUs support up to 4 Petabytes (252) of virtual memory and 4 Gigabytes 
(232) of physical memory for instructions and data. The MMUs also control access 
privileges for these spaces on block and page granularities. Referenced and changed status 
is maintained by the processor for each page to support demand-paged virtual memory 
systems. 


The LSU calculates effective addresses for data loads and stores; the instruction unit 
calculates effective addresses for instruction fetching. The MMU translates the effective 
address to determine the correct physical address for the memory access. 


The MPC750 supports the following types of memory translation: 


¢ Real addressing mode—In this mode, translation is disabled by clearing bits in the 
machine state register (MSR): MSR[IR] for instruction fetching or MSR[DR] for 
data accesses. When address translation is disabled, the physical address is identical 
to the effective address. 


e Page address translation—translates the page frame address for a 4-Kbyte page size 


¢ Block address translation—translates the base address for blocks (128 Kbytes to 256 
Mbytes) 


If translation is enabled, the appropriate MMU translates the higher-order bits of the 
effective address into physical address bits. The lower-order address bits (that are 
untranslated and therefore, considered both logical and physical) are directed to the on-chip 
caches where they form the index into the eight-way set-associative tag array. After 
translating the address, the MMU passes the higher-order physical address bits to the cache 
and the cache lookup completes. For caching-inhibited accesses or accesses that miss in the 
cache, the untranslated lower-order address bits are concatenated with the translated 
higher-order address bits; the resulting 32-bit physical address is used by the memory unit 
and the system interface, which accesses external memory. 


The TLBs store page address translations for recent memory accesses. For each access, an 
effective address is presented for page and block translation simultaneously. If a translation 
is found in both the TLB and the BAT array, the block address translation in the BAT array 
is used. Usually the translation is in a TLB and the physical address is readily available to 
the on-chip cache. When a page address translation is not in a TLB, hardware searches for 
one in the page table following the model defined by the PowerPC architecture. 


Instruction and data TLBs provide address translation in parallel with the on-chip cache 
access, incurring no additional time penalty in the event of a TLB hit. The MPC750’s TLBs 
are 128-entry, two-way set-associative caches that contain instruction and data address 
translations. The MPC750 automatically generates a TLB search on a TLB miss. 
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1.2.4 On-Chip Instruction and Data Caches 


The MPC750 implements separate instruction and data caches. Each cache is 32-Kbyte and 
eight-way set associative. As defined by the PowerPC architecture, they are physically 
indexed. Each cache block contains eight contiguous words from memory that are loaded 
from an 8-word boundary (that is, bits EA[27—31] are zeros); thus, a cache block never 
crosses a page boundary. An entire cache block can be updated by a four-beat burst load. 
Misaligned accesses across a page boundary can incur a performance penalty. Caches are 
nonblocking, write-back caches with hardware support for reloading on cache misses. The 
critical double word is transferred on the first beat and is simultaneously written to the 
cache and forwarded to the requesting unit, minimizing stalls due to load delays. The cache 
being loaded is not blocked to internal accesses while the load completes. 


The MPC750 cache organization is shown in Figure 1-2. 
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Figure 1-2. Cache Organization 








Within one cycle, the data cache provides double-word access to the LSU. Like the 
instruction cache, the data cache can be invalidated all at once or on a per-cache-block 
basis. The data cache can be disabled and invalidated by clearing HIDO[DCE] and setting 
HIDO[DCFI]. The data cache can be locked by setting HIDO[DLOCK]. To ensure cache 
coherency, the data cache supports the three-state MEI protocol. The data cache tags are 
single-ported, so a simultaneous load or store and a snoop access represent a resource 
collision. If a snoop hit occurs, the LSU is blocked internally for one cycle to allow the 
eight-word block of data to be copied to the write-back buffer. 
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Within one cycle, the instruction cache provides up to four instructions to the instruction 
queue. The instruction cache can be invalidated entirely or on a cache-block basis. The 
instruction cache can be disabled and invalidated by clearing HIDO[ICE] and setting 
HIDO[ICFI]. The instruction cache can be locked by setting HIDO[ILOCK]. The instruction 
cache supports only the valid/invalid states. 


The MPC750 also implements a 64-entry (16-set, four-way set-associative) branch target 
instruction cache (BTIC). The BTIC is a cache of branch instructions that have been 
encountered in branch/loop code sequences. If the target instruction is in the BTIC, it is 
fetched into the instruction queue a cycle sooner than it can be made available from the 
instruction cache. Typically the BTIC contains the first two instructions in the target stream. 
The BTIC can be disabled and invalidated through software. 


For more information and timing examples showing cache hit and cache miss latencies, see 
Section 6.3.2, “Instruction Fetch Timing.” 


1.2.5 L2 Cache Implementation (Not Supported in the 
MPC740) 


The L2 cache is a unified cache that receives memory requests from both the L1 instruction 
and data caches independently. The L2 cache is implemented with an on-chip, two-way, 
set-associative tag memory, and with external, synchronous SRAMs for data storage. The 
external SRAMsSs are accessed through a dedicated L2 cache port that supports a single bank 
of up to 1 Mbyte of synchronous SRAMs. The L2 cache normally operates in write-back 
mode and supports system cache coherency through snooping. 


Depending on its size, the L2 cache is organized into 64- or 128-byte lines, which in turn 
are subdivided into 32-byte sectors (blocks), the unit at which cache coherency is 
maintained. 


The L2 cache controller contains the L2 cache control register (L2CR), which includes bits 
for enabling parity checking, setting the L2-to-processor clock ratio, and identifying the 
type of RAM used for the L2 cache implementation. The L2 cache controller also manages 
the L2 cache tag array, two-way set-associative with 4K tags per way. Each sector (32-byte 
cache block) has its own valid and modified status bits. 


Requests from the L1 cache generally result from instruction misses, data load or store 
misses, write-through operations, or cache management instructions. Requests from the L1 
cache are looked up in the L2 tags and serviced by the L2 cache if they hit; they are 
forwarded to the bus interface if they miss. 


The L2 cache can accept multiple, simultaneous accesses. The L1 instruction cache can 
request an instruction at the same time that the L1 data cache is requesting one load and two 
store operations. The L2 cache also services snoop requests from the bus. If there are 
multiple pending requests to the L2 cache, snoop requests have highest priority. The next 
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priority consists of load and store requests from the L1 data cache. The next priority 
consists of instruction fetch requests from the L1 instruction cache. 


For more information, see Chapter 9, “L2 Cache Interface Operation.” 


1.2.6 System Interface/Bus Interface Unit (BIU) 


The address and data buses operate independently; address and data tenures of a memory 
access are decoupled to provide a more flexible control of memory traffic. The primary 
activity of the system interface is transferring data and instructions between the processor 
and system memory. There are two types of memory accesses: 


e Single-beat transfers—These memory accesses allow transfer sizes of 8, 16, 24, 32, 
or 64 bits in one bus clock cycle. Single-beat transactions are caused by uncacheable 
read and write operations that access memory directly (that is, when caching is 
disabled), cache-inhibited accesses, and stores in write-through mode. 


¢ Four-beat burst (32 bytes) data transfers—Burst transactions, which always transfer 
an entire cache block (32 bytes), are initiated when an entire cache block is 
transferred. Because the first-level caches on the MPC750 are write-back caches, 
burst-read memory, burst operations are the most common memory accesses, 
followed by burst-write memory operations, and single-beat (noncacheable or 
write-through) memory read and write operations. 


The MPC750 also supports address-only operations, variants of the burst and single-beat 
operations, (for example, atomic memory operations and global memory operations that are 
snooped), and address retry activity (for example, when a snooped read access hits a 
modified block in the cache). The broadcast of some address-only operations is controlled 
through HIDO[ABE]. I/O accesses use the same protocol as memory accesses. 


Access to the system interface is granted through an external arbitration mechanism that 
allows devices to compete for bus mastership. This arbitration mechanism is flexible, 
allowing the MPC750 to be integrated into systems that implement various fairness and bus 
parking procedures to avoid arbitration overhead. 


Typically, memory accesses are weakly ordered—sequences of operations, including 
load/store string and multiple instructions, do not necessarily complete in the order they 
begin—maximizing the efficiency of the bus without sacrificing data coherency. The 
MPC750 allows read operations to go ahead of store operations (except when a dependency 
exists, or in cases where a noncacheable access is performed), and provides support for a 
write operation to go ahead of a previously queued read data tenure (for example, letting a 
snoop push be enveloped between address and data tenures of a read operation). Because 
the MPC750 can dynamically optimize run-time ordering of load/store traffic, overall 
performance is improved. 


The system interface is specific for each microprocessor. 
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The MPC750 signals are grouped as shown in Figure 1-3. Signals are provided for clocking 
and control of the L2 caches, as well as separate L2 address and data buses. Test and control 
signals provide diagnostics for selected internal circuits. 


Address Arbitration Data Arbitration 
Address Start Data Transfer 
Address Transfer Data Termination 
Transfer Attribute MPC750 L2 Cache Clock/Control’ 
Address Termination L2 Cache Address/Data! 
Clocks Processor Status/Control 
System Status Test and Control 


Vpp  Vpp(V/0) ‘Not supported in the MPC740 
Figure 1-3. System Interface 


The system interface supports address pipelining, which allows the address tenure of one 
transaction to overlap the data tenure of another. The extent of the pipelining depends on 
external arbitration and control circuitry. Similarly, the MPC750 supports split-bus 
transactions for systems with multiple potential bus masters—one device can have 
mastership of the address bus while another has mastership of the data bus. Allowing 
multiple bus transactions to occur simultaneously increases the available bus bandwidth for 
other activity. 


The MPC750’s clocking structure supports a wide range processor-to-bus clock ratios. 


1.2.7 Signals 


The MPC750’s signals are grouped as follows: 


e Address arbitration signals—The MPC750 uses these signals to arbitrate for address 
bus mastership. 


e Address start signals—These signals indicate that a bus master has begun a 
transaction on the address bus. 


e Address transfer signals—These signals include the address bus and address parity 
signals. They are used to transfer the address and to ensure the integrity of the 
transfer. 

e Transfer attribute signals—These signals provide information about the type of 
transfer, such as the transfer size and whether the transaction is bursted, 
write-through, or caching-inhibited. 

e Address termination signals—These signals are used to acknowledge the end of the 
address phase of the transaction. They also indicate whether a condition exists that 
requires the address phase to be repeated. 
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¢ Data arbitration signals—The MPC750 uses these signals to arbitrate for data bus 
mastership. 


e Data transfer signals—These signals, which consist of the data bus and data parity 
signals, are used to transfer the data and to ensure the integrity of the transfer. 


¢ Data termination signals—Data termination signals are required after each data beat 
in a data transfer. In a single-beat transaction, a data termination signal also indicates 
the end of the tenure; in burst accesses, data termination signals apply to individual 
beats and indicate the end of the tenure only after the final data beat. They also 
indicate whether a condition exists that requires the data phase to be repeated. 


¢ L2 cache clock/control signals—These signals provide clocking and control for the 
L2 cache. (Not supported in the MPC740.) 


e L2 cache address/data—The MPC750 has separate address and data buses for 
accessing the L2 cache. (Not supported in the MPC740.) 


e Interrupt signals—These signals include the interrupt signal, checkstop signals, and 
both soft reset and hard reset signals. These signals are used to generate interrupt 
exceptions and, under various conditions, to reset the processor. 


¢ Processor status/control signals—These signals are used to set the reservation 
coherency bit, enable the time base, and other functions. 


¢ Miscellaneous signals—These signals are used in conjunction with such resources 
as secondary caches and the time base facility. 


¢ JTAG/COP interface signals—The common on-chip processor (COP) unit provides 
a serial interface to the system for performing board-level boundary scan 
interconnect tests. 


¢ Clock signals—These signals determine the system clock frequency. These signals 
can also be used to synchronize multiprocessor systems. 


NOTE 


A bar over a signal name indicates that the signal is active 
low—for example, ARTRY (address retry) and TS (transfer 
start). Active-low signals are referred to as asserted (active) 
when they are low and negated when they are high. Signals that 
are not active low, such as AP[0—3] (address bus parity signals) 
and TT[0—4] (transfer type signals) are referred to as asserted 
when they are high and negated when they are low. 


1.2.8 Signal Configuration 


Figure 1-4 shows the MPC750's logical pin configuration. The signals are grouped by 
function. 
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Figure 1-4. MPC750 Microprocessor Signal Groups 


Signal functionality is described in detail in Chapter 7, “Signal Descriptions,’ and 
Chapter 8, “System Interface Operation.” 


1.2.9 Clocking 


The MPC750 requires a single system clock input, SYSCLK, that represents the bus 
interface frequency. Internally, the processor uses a phase-locked loop (PLL) circuit to 
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generate a master core clock that is frequency-multiplied and phase-locked to the SYSCLK 
input. This core frequency is used to operate the internal circuitry. 


The PLL is configured by the PLL_CFG[0-3] signals, which select the multiplier that the 
PLL uses to multiply the SYSCLK frequency up to the internal core frequency. The 
feedback in the PLL guarantees that the processor clock is phase locked to the bus clock, 
regardless of process variations, temperature changes, or parasitic capacitances. The PLL 
also ensures a 50% duty cycle for the processor clock. 


The MPC750 supports various processor-to-bus clock frequency ratios, although not all 
ratios are available for all frequencies. Configuration of the processor/bus clock ratios is 
displayed through a MPC750-specific register, HID1. For information about supported 
clock frequencies, see the MPC750 hardware specifications. 


1.3. MPC750 Microprocessor Implementation 


The PowerPC architecture is derived from the POWER architecture (Performance 
Optimized with Enhanced RISC architecture). The PowerPC architecture shares the 
benefits of the POWER architecture optimized for single-chip implementations. The 
PowerPC architecture design facilitates parallel instruction execution and is scalable to take 
advantage of future technological gains. 


This section describes the PowerPC architecture in general, and specific details about the 
implementation of the MPC750 as a low-power, 32-bit device that implements this 
architecture. The structure of this section follows the organization of the user’s manual; 
each subsection provides an overview of each chapter. 


¢ Registers and programming model—Section 1.4, “PowerPC Registers and 
Programming Model,’ describes the registers for the operating environment 
architecture common among processors of this family and describes the 
programming model. It also describes the registers that are unique to the MPC750. 
The information in this section is described more fully in Chapter 2, “Programming 
Model.” 


e Instruction set and addressing modes—Section 1.5, “Instruction Set,” describes the 
PowerPC instruction set and addressing modes for the PowerPC operating 
environment architecture, and defines and describes the PowerPC instructions 
implemented in the MPC750. The information in this section is described more fully 
in Chapter 2, “Programming Model.” 


¢ Cache implementation—Section 1.6, “On-Chip Cache Implementation,” describes 
the cache model that is defined generally by the virtual environment architecture. It 
also provides specific details about the MPC750 cache implementation. The 
information in this section is described more fully in Chapter 3, “L1 Instruction and 
Data Cache Operation.” 


e¢ Exception model—Section 1.7, “Exception Model,” describes the exception model 
of the PowerPC operating environment architecture and the differences in the 
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MPC750 exception model. The information in this section is described more fully 
in Chapter 4, “Exceptions.” 


Memory management—Section 1.8, “Memory Management,” describes generally 
the conventions for memory management among the processors of this family. This 
section also describes the MPC750’s implementation of the 32-bit PowerPC 
memory management specification. The information in this section is described 
more fully in Chapter 5, “Memory Management 


Instruction timing—Section 1.9, “Instruction Timing,” provides a general 
description of the instruction timing provided by the superscalar, parallel execution 
supported by the PowerPC architecture and the MPC750. The information in this 
section is described more fully in Chapter 6, “Instruction Timing,” 


Power management—Section 1.10, “Power Management,” describes how the power 
management can be used to reduce power consumption when the processor, or 
portions of it, are idle. The information in this section is described more fully in 
Chapter 10, “Power and Thermal Management.” 


Thermal management—Section 1.11, “Thermal Management,” describes how the 
thermal management unit and its associated registers (THRM1—-THRM3) and 
exception can be used to manage system activity in a way that prevents exceeding 
system and junction temperature thresholds. This is particularly useful in 
high-performance portable systems, which cannot use the same cooling mechanisms 
(such as fans) that control overheating in desktop systems. The information in this 
section is described more fully in Chapter 10, “Power and Thermal Management.” 


Performance monitor—Section 1.12, “Performance Monitor,” describes the 
performance monitor facility, which system designers can use to help bring up, 
debug, and optimize software performance. The information in this section is 
described more fully in Chapter 10, “Power and Thermal Management.” 


The following sections summarize the features of the MPC750, distinguishing those that 
are defined by the architecture and from those that are unique to the MPC750 
implementation. 


The PowerPC architecture consists of the following layers, and adherence to the PowerPC 
architecture can be described in terms of which of the following levels of the architecture 
is implemented: 


PowerPC user instruction set architecture (UISA)—Defines the base user-level 
instruction set, user-level registers, data types, floating-point exception model, 
memory models for a uniprocessor environment, and programming model for a 
uniprocessor environment. 

PowerPC virtual environment architecture (VEA)—Describes the memory model 
for a multiprocessor environment, defines cache control instructions, and describes 
other aspects of virtual environments. Implementations that conform to the VEA 
also adhere to the UISA, but may not necessarily adhere to the OEA. 
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¢ PowerPC operating environment architecture (OEA)—Defines the memory 
management model, supervisor-level registers, synchronization requirements, and 
the exception model. Implementations that conform to the OEA also adhere to the 
UISA and the VEA. 


The PowerPC architecture allows a wide range of designs for such features as cache and 
system interface implementations. The MPC750 implementations support the three levels 
of the architecture described above. For more information about the PowerPC architecture, 
see Programming Environments Manual. 


Specific features of the MPC750 are listed in Section 1.2, “MPC750 Microprocessor 
Features.” 


1.4 PowerPC Registers and Programming Model 


The PowerPC architecture defines register-to-register operations for most computational 
instructions. Source operands for these instructions are accessed from the registers or are 
provided as immediate values embedded in the instruction opcode. The three-register 
instruction format allows specification of a target register distinct from the two source 
operands. Load and store instructions transfer data between registers and memory. 


Processors of this family have two levels of privilege—supervisor mode of operation 
(typically used by the operating system) and user mode of operation (used by the 
application software). The programming models incorporate 32 GPRs, 32 FPRs, 
special-purpose registers (SPRs), and several miscellaneous registers. Each microprocessor 
also has its own unique set of hardware implementation-dependent (HID) registers. 


Having access to privileged instructions, registers, and other resources allows the operating 
system to control the application environment (providing virtual memory and protecting 
operating-system and critical machine resources). Instructions that control the state of the 
processor, the address translation mechanism, and supervisor registers can be executed only 
when the processor is operating in supervisor mode. 


Figure 1-5 shows all the MPC750 registers available at the user and supervisor level. The 
numbers to the right of the SPRs indicate the number that is used in the syntax of the 
instruction operands to access the register. 


For more information, see Chapter 2, “Programming Model.” 
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1 These registers are MPC750-speci c registers . They may not be supported by other processors that implement the 
PowerPC architecture. 


2 Not supported by the MPC740. 
Figure 1-5. MPC750 Microprocessor Programming Model—Registers 
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The following tables summarize the registers implemented in the MPC750; Table 1-1 
describes registers (excluding SPRs) defined by the architecture. 


Table 1-1. Architecture-Defined Registers on the MPC750 (Excluding SPRs) 





Register Level Function 





CR User The condition register (CR) consists of eight four-bit elds that re ect the results of cer tain 
operations, such as move, integer and oating-point compare , arithmetic, and logical 
instructions, and provide a mechanism for testing and branching. 








FPRs User The 32 oating-point registers (FPRs) ser ve as the data source or destination for oating-point 
instructions. These 64-bit registers can hold either single- or double-precision oating-point 
values. 

FPSCR_ | User The oating-point status and control register (FPSCR) contains the oating-point e xception 


signal bits, exception summary bits, exception enable bits, and rounding control bits needed 
for compliance with the IEEE-754 standard. 


GPRs User The 32 GPRs serve as the data source or destination for integer instructions. 





MSR Supervisor | The machine state register (MSR) de nes the processor state . Its contents are saved when an 
exception is taken and restored when exception handling completes. The MPC750 implements 
MSR[POW], (de ned b y the architecture as optional), which is used to enable the power 
management feature. The MPC750-speci c MSR[PM] bit is used to mar k a process for the 
performance monitor. 





SRO-SR | Supervisor | The sixteen 32-bit segment registers (SRs) de ne the 4-Gb yte space as sixteen 256-Mbyte 
15 segments. The MPC750 implements segment registers as two arrays—a main array for data 
accesses and a shadow array for instruction accesses; see Figure 1-1. Loading a segment 
entry with the Move to Segment Register (mtsr) instruction loads both arrays. The mfsr 
instruction reads the master register, shown as part of the data MMU in Figure 1-1. 

















The OEA defines numerous special-purpose registers that serve a variety of functions, such 
as providing controls, indicating status, configuring the processor, and performing special 
operations. During normal execution, a program can access the registers, shown in 
Figure 1-5, depending on the program’s access privilege (supervisor or user, determined by 
the privilege-level (PR) bit in the MSR). GPRs and FPRs are accessed through operands 
that are part of the instructions. Access to registers can be explicit (that is, through the use 
of specific instructions for that purpose such as Move to Special-Purpose Register (mtspr) 
and Move from Special-Purpose Register (mfspr) instructions) or implicit, as the part of 
the execution of an instruction. Some registers can be accessed both explicitly and 
implicitly. 


In the MPC750, all SPRs are 32 bits wide. Table 1-2 describes the architecture-defined 
SPRs implemented by the MPC750. The Programming Environments Manual describes 
these registers in detail, including bit descriptions. Section 2.1.1, “Register Set,’ describes 
how these registers are implemented in the MPC750. In particular, this section describes 


which features the PowerPC architecture defines as optional are implemented on the 
MPC750. 
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Table 1-2. Architecture-Defined SPRs Implemented by the MPC750 















































Register Level Function 

LR User The link register (LR) can be used to provide the branch target address and to hold the 
return address after branch and link instructions. 

BATs Supervisor | The architecture de nes 16 b lock address translation registers (BATs), which operate in 
pairs. There are four pairs of data BATs (DBATs) and four pairs of instruction BATs (IBATs). 
BATs are used to de ne and con gure b locks of memory. 

CTR User The count register (CTR) is decremented and tested by branch-and-count instructions. 

DABR Supervisor | The optional data address breakpoint register (DABR) supports the data address 
breakpoint facility. 

DAR User The data address register (DAR) holds the address of an access after an alignment or DSI 
exception. 

DEC Supervisor | The decrementer register (DEC) is a 32-bit decrementing counter that provides a way to 
schedule decrementer exceptions. 

DSISR User The DSISR de nes the cause of data access and alignment e xceptions. 

EAR Supervisor | The external access register (EAR) controls access to the external access facility through 
the External Control In Word Indexed (eciwx) and External Control Out Word Indexed 
(ecowx) instructions. 

PVR Supervisor | The processor version register (PVR) is a read-only register that identi es the processor . 

SDR1 Supervisor |SDR1 speci es the page tab le format used in virtual-to-physical page address translation. 

SRRO Supervisor | The machine status save/restore register 0 (SRRO) saves the address used for restarting 
an interrupted program when a Return from Interrupt (rfi) instruction executes. 

SRR1 Supervisor | The machine status save/restore register 1 (SRR1) is used to save machine status on 
exceptions and to restore machine status when an rfi instruction is executed. 

SPRGO-S | Supervisor | SPRGO-SPRG3 are provided for operating system use. 

PRG3 

TB User: read | The time base register (TB) is a 64-bit register that maintains the time of day and operates 

Supervisor: | interval timers. The TB consists of two 32-bit elds—time base upper (TB U) and time base 
read/write | lower (TBL). 

XER User The XER contains the summary over o w bit, integer carry bit, over o w bit, anda eld 
specifying the number of bytes to be transferred by a Load String Word Indexed (Iswx) or 
Store String Word Indexed (stswx) instruction. 














Table 1-3 describes the supervisor-level SPRs in the MPC750 that are not defined by the 
PowerPC architecture. Section 2.1.2, “MPC750-Specific Registers,’ gives detailed 
descriptions of these registers, including bit descriptions. 


Table 1-3. MPC750-Specific Registers 





Register 


Level 


Function 





HIDO 





Supervisor 





The hardware implementation-dependent register 0 (HIDO) provides checkstop enables 
and other functions. 
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Table 1-3. MPC750-Specific Registers (continued) 
































Register Level Function 

HID1 Supervisor | The hardware implementation-dependent register 1 (HID1) allows software to read the 
con gur ation of the PLL con gur ation signals. 

IABR Supervisor | The instruction address breakpoint register (IABR) supports instruction address 
breakpoint exceptions. It can hold an address to compare with instruction addresses in the 
IQ. An address match causes an instruction address breakpoint exception. 

ICTC Supervisor | The instruction cache-throttling control register (ICTC) has bits for controlling the interval 
at which instructions are fetched into the instruction buffer in the instruction unit. This helps 
control the MPC750’s overall junction temperature. 

L2CR Supervisor | The L2 cache control register (L2CR) is used to con gure and oper ate the L2 cache. It has 
bits for enabling parity checking, setting the L2-to-processor clock ratio, and identifying the 
type of RAM used for the L2 cache implementation. (The L2 cache feature is not supported 
in the MPC740.) 

MMCRO-MM | Supervisor | The monitor mode control registers (MMCRO—MMCR171) are used to enable various 

CR1 performance monitoring interrupt functions. UJMMCRO-UMMCRY1 provide user-level read 
access to MMCRO-MMCR1. 

PMC1-—PMC | Supervisor | The performance monitor counter registers (PMC1—PMC4) are used to count speci ed 

4 events. UPMC1—UPMC4 provide user-level read access to these registers. 

SIA Supervisor | The sampled instruction address register (SIA) holds the EA of an instruction executing at 
or around the time the processor signals the performance monitor interrupt condition. The 
USIA register provides user-level read access to the SIA. 

THRM1, Supervisor | THRM1 and THRM2 provide a way to compare the junction temperature against two 

THRM2 user-provided thresholds. The thermal assist unit (TAU) can be operated so that the 
thermal sensor output is compared to only one threshold, selected in THRM1 or THRM2. 

THRM3 Supervisor | THRM3 is used to enable the TAU and to control the output sample time. 

UMMCRO-U | User The user monitor mode control registers (UMMCRO—UMMCR71) provide user-level read 

MMCR1 access to MMCRO-MMCR1. 

UPMC1-—UP_ | User The user performance monitor counter registers (UPMC1—UPMC4) provide user-level 

MC4 read access to PMC1—PMC4. 

USIA User The user sampled instruction address register (USIA) provides user-level read access to 














the SIA register. 





1.5 


Instruction Set 


All PowerPC instructions are encoded as single-word (32-bit) opcodes. Instruction formats 
are consistent among all instruction types, permitting efficient decoding to occur in parallel 
with operand accesses. This fixed instruction length and consistent format greatly simplifies 
instruction pipelining. 


For more information, see Chapter 2, “Programming Model.” 


1.5.1 


PowerPC Instruction Set 


The PowerPC instructions are divided into the following categories: 
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e Integer instructions—These include computational and logical instructions. 
— Integer arithmetic instructions 
— Integer compare instructions 
— Integer logical instructions 
— Integer rotate and shift instructions 


¢ Floating-point instructions—These include floating-point computational 
instructions, as well as instructions that affect the FPSCR. 


— Floating-point arithmetic instructions 

— Floating-point multiply/add instructions 

— Floating-point rounding and conversion instructions 
— Floating-point compare instructions 

— Floating-point status and control instructions 


¢ Load/store instructions—These include integer and floating-point load and store 
instructions. 


— Integer load and store instructions 
— Integer load and store multiple instructions 
— Floating-point load and store 


— Primitives used to construct atomic memory operations (lwarx and stwex. 
instructions) 


¢ Flow control instructions—These include branching instructions, condition register 
logical instructions, trap instructions, and other instructions that affect the 
instruction flow. 


— Branch and trap instructions 
— Condition register logical instructions 


¢ Processor control instructions—These instructions are used for synchronizing 
memory accesses and management of caches, TLBs, and the segment registers. 


— Move to/from SPR instructions 
— Move to/from MSR 

— Synchronize 

— Instruction synchronize 

— Order loads and stores 


¢ Memory control instructions—These instructions provide control of caches, TLBs, 
and SRs. 


— Supervisor-level cache management instructions 
— User-level cache instructions 


— Segment register manipulation instructions 
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— Translation lookaside buffer management instructions 


This grouping does not indicate the execution unit that executes a particular instruction or 
group of instructions. 


Integer instructions operate on byte, half-word, and word operands. Floating-point 
instructions operate on single-precision (one word) and double-precision (one double 
word) floating-point operands. The PowerPC architecture uses instructions that are four 
bytes long and word-aligned. It provides for byte, half-word, and word operand loads and 
stores between memory and a set of 32 GPRs. It also provides for word and double-word 
operand loads and stores between memory and a set of 32 floating-point registers (FPRs). 


Computational instructions do not modify memory. To use a memory operand in a 
computation and then modify the same or another memory location, the memory contents 
must be loaded into a register, modified, and then written back to the target location with 
distinct instructions. 


Processors in this family follow the program flow when they are in the normal execution 
state. However, the flow of instructions can be interrupted directly by the execution of an 
instruction or by an asynchronous event. Either kind of exception may cause one of several 
components of the system software to be invoked. 


Effective address computations for both data and instruction accesses use 32-bit unsigned 
binary arithmetic. A carry from bit 0 is ignored in 32-bit implementations. 


1.5.2 MPC750 Microprocessor Instruction Set 


The MPC750 instruction set is defined as follows: 
e The MPC750 provides hardware support for all 32-bit PowerPC instructions. 


e The MPC750 implements the following instructions optional to the PowerPC 
architecture: 


— External Control In Word Indexed (eciwx) 

— External Control Out Word Indexed (ecowx) 

— Floating Select (fsel) 

— Floating Reciprocal Estimate Single-Precision (fres) 
— Floating Reciprocal Square Root Estimate (frsqrte) 
— Store Floating-Point as Integer Word (stfiwx) 


1.6 On-Chip Cache Implementation 


The following subsections describe the PowerPC architecture’s treatment of cache in 
general, and the MPC750-specific implementation, respectively. A detailed description of 


Chapter 1. Overview 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Exception Model 


the MPC750 cache implementation is provided in Chapter 3, “L1 Instruction and Data 
Cache Operation.” 


1.6.1 PowerPC Cache Model 


The PowerPC architecture does not define hardware aspects of cache implementations. For 
example, processors can have unified caches, separate instruction and data caches (Harvard 
architecture), or no cache at all. The microprocessors control the following memory access 
modes on a page or block basis: 


¢ Write-back/write-through mode 
¢ Caching-inhibited mode 
¢ Memory coherency 


The caches are physically addressed, and the data cache can operate in either write-back or 
write-through mode as specified by the PowerPC architecture. 


The PowerPC architecture defines the term ‘cache block’ as the cacheable unit. The VEA 
and OFA define cache management instructions a programmer can use to affect cache 
contents. 


1.6.2 MPC750 Microprocessor Cache Implementation 


The MPC750 cache implementation is described in Section 1.2.4, “On-Chip Instruction 
and Data Caches,” and Section 1.2.5, “L2 Cache Implementation (Not Supported in the 
MPC740).” The BPU also contains a 64-entry BTIC that provides immediate access to 
cached target instructions. For more information, see Section 1.2.2.2, “Branch Processing 
Unit (BPU).” 


1.7 Exception Model 


The following sections describe the PowerPC exception model and the MPC750 
implementation. A detailed description of the MPC750 exception model is provided in 
Chapter 4, “Exceptions.” 


1.7.1. PowerPC Exception Model 


The PowerPC exception mechanism allows the processor to interrupt the instruction flow 
to handle certain situations caused by external signals, errors, or unusual conditions arising 
from the instruction execution. When exceptions occur, information about the state of the 
processor is saved to certain registers and the processor begins execution at an address 
(exception vector) predetermined for each exception. Exception processing occurs in 
supervisor mode. 
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Although multiple exception conditions can map to a single exception vector, a more 
specific condition may be determined by examining a register associated with the 
exception—for example, the DSISR and the FPSCR. Additionally, some exception 
conditions can be enabled or disabled explicitly by software. 


The PowerPC architecture requires that exceptions be handled in program order; therefore, 
although a particular implementation may recognize exception conditions out of order, they 
are handled in order. When an instruction-caused exception is recognized, any unexecuted 
instructions that appear earlier in the instruction stream, including any that are 
undispatched, are required to complete before the exception is taken, and any exceptions 
those instructions cause must also be handled first. Likewise, asynchronous, precise 
exceptions are recognized when they occur, but are not handled until the instructions 
currently in the completion queue successfully retire or generate an exception, and the 
completion queue is emptied. 


Unless a catastrophic condition causes a system reset or machine check exception, only one 
exception is handled at a time. For example, if one instruction encounters multiple 
exception conditions, those conditions are handled sequentially. After the exception handler 
handles an exception, the instruction processing continues until the next exception 
condition is encountered. Recognizing and handling exception conditions sequentially 
guarantees that exceptions are recoverable. 


When an exception is taken, information about the processor state before the exception was 
taken is saved in SRRO and SRR1. Exception handlers should save the information stored 
in SRRO and SRR1 early to prevent the program state from being lost due to a system reset 
and machine check exception or to an instruction-caused exception in the exception 
handler, and before enabling external interrupts. 


The PowerPC architecture supports four types of exceptions: 


e Synchronous, precise—These are caused by instructions. All instruction-caused 
exceptions are handled precisely; that is, the machine state at the time the exception 
occurs is known and can be completely restored. This means that (excluding the trap 
and system call exceptions) the address of the faulting instruction is provided to the 
exception handler and that neither the faulting instruction nor subsequent 
instructions in the code stream will complete execution before the exception is 
taken. Once the exception is processed, execution resumes at the address of the 
faulting instruction (or at an alternate address provided by the exception handler). 
When an exception is taken due to a trap or system call instruction, execution 
resumes at an address provided by the handler. 


¢ Synchronous, imprecise—The PowerPC architecture defines two imprecise 
floating-point exception modes, recoverable and nonrecoverable. Even though the 
MPC750 provides a means to enable the imprecise modes, it implements these 
modes identically to the precise mode (that is, enabled floating-point exceptions are 
always precise). 
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e Asynchronous, maskable—The PowerPC architecture defines external and 
decrementer interrupts as maskable, asynchronous exceptions. When these 
exceptions occur, their handling is postponed until the next instruction, and any 
exceptions associated with that instruction, completes execution. If no instructions 
are in the execution units, the exception is taken immediately upon determination of 
the correct restart address (for loading SRRO). As shown in Table 1-4, the MPC750 
implements additional asynchronous, maskable exceptions. 


e Asynchronous, nonmaskable—There are two nonmaskable asynchronous 
exceptions: system reset and the machine check exception. These exceptions may 
not be recoverable, or may provide a limited degree of recoverability. Exceptions 
report recoverability through the MSR[RI] bit. 


1.7.2. MPC750 Microprocessor Exception Implementation 


The MPC750 exception classes described above are shown in Table 1-4. 


Table 1-4. MPC750 Microprocessor Exception Classifications 


Synchronous/Asynchronous | Precise/Imprecise Exception Type 
Asynchronous, nonmaskable Machine check, system reset 


Asynchronous, maskable Precise External, decrementer, system management, performance 
monitor, and thermal management interrupts 
Synchronous Precise Instruction-caused exceptions 


Although exceptions have other characteristics, such as priority and recoverability, 
Table 1-4 describes categories of exceptions the MPC750 handles uniquely. Table 1-4 
includes no synchronous imprecise exceptions; although the PowerPC architecture 
supports imprecise handling of floating-point exceptions, the MPC750 implements these 
exception modes precisely. Table 1-5 lists MPC750 exceptions and conditions that cause 
them. Exceptions specific to the MPC750 are indicated. 





Table 1-5. Exceptions and Conditions 


Exception Type bar seh tet Causing Conditions 


System reset 00100 Assertion of either HRESET or SRESET or at power-on reset 





Machine check 00200 Assertion of TEA during a data bus transaction, assertion of MCP, or an 
address, data, or L2 bus parity error. MSR[ME] must be set. 

DSI 00300 As speci ed in the P owerPC architecture. For TLB misses on load, store, or 
cache operations, a DSI exception occurs if a page fault occurs. 

Sl 


00400 As de ned by the PowerPC architecture. 





MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Memory Management 


Table 1-5. Exceptions and Conditions (continued) 


Vector Offset 
(hex) 


External interrupt 00500 MSR[EE] = 1 and INT is asserted. 


Alignment 00600 *A oating-point load/store , stmw, stwex, Imw, lwarx, eciwx or ecowx 
instruction operand is not word-aligned. 
*A multiple/string load/store operation is attempted in little-endian mode. 
*The operand of dcebz is in memory that is write-through-required or 
caching-inhibited or the cache is disabled 


00700 As de ned by the PowerPC architecture. 


Floating-point 00800 As de ned by the PowerPC architecture. 

unavailable 

Decrementer 00900 As de ned by the PowerPC architecture, when the most signi cant bit of the 
DEC register changes from 0 to 1 and MSR[EE] = 1. 

System call 00C00 Execution of the System Call (sc) instruction. 


MSR[SE] = 1 or a branch instruction completes and MSR[BE] = 1. Unlike the 
architecture de nition, isyne does not cause a trace exception 


Reserved OOE00 The MPC750 does not generate an exception to this vector. Other processors 
may use this vector for oating-point assist e xceptions. 
0OFOO The limit speci ed in a PMC register is reached and MMCRO[ENINT] = 1 


Instruction address 01300 IABR[0-—29] matches EA[O-—29] of the next instruction to complete, 
breakpoint" IABR[TE] matches MSR[IR], and IABR[BE] = 1. 

System inaniegement 01400 MSR[EE] = 1 and SMI is asserted. 

interrupt! 


Thermal management | 01700 Thermal management is enabled, the junction temperature exceeds the 
interrupt’ threshold speci ed in THRM1 or THRM2, and MSR[EE] = 1. 


Note: 


Exception Type Causing Conditions 





1 MPC750-speci c 


1.8 Memory Management 


The following subsections describe the memory management features of the PowerPC 
architecture, and the MPC750 implementation, respectively. A detailed description of the 
MPC750 MMU implementation is provided in Chapter 5, “Memory Management.” 
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1.8.1 PowerPC Memory Management Model 


The primary functions of the MMU are to translate logical (effective) addresses to physical 
addresses for memory accesses and to provide access protection on blocks and pages of 
memory. There are two types of accesses generated by the MPC750 that require address 
translation—instruction accesses, and data accesses to memory generated by load, store, 
and cache control instructions. 


The PowerPC architecture defines different resources for 32- and 64-bit processors; the 
MPC750 implements the 32-bit memory management model. The memory-management 
model provides 4 Gbytes of logical address space accessible to supervisor and user 
programs with a 4-Kbyte page size and 256-Mbyte segment size. BAT block sizes range 
from 128 Kbyte to 256 Mbyte and are software selectable. In addition, it defines an interim 
52-bit virtual address and hashed page tables for generating 32-bit physical addresses. 


The architecture also provides independent four-entry BAT arrays for instructions and data 
that maintain address translations for blocks of memory. These entries define blocks that 
can vary from 128 Kbytes to 256 Mbytes. The BAT arrays are maintained by system 
software. 


The PowerPC MMU and exception model support demand-paged virtual memory. Virtual 
memory management permits execution of programs larger than the size of physical 
memory; demand-paged implies that individual pages are loaded into physical memory 
from system memory only when they are first accessed by an executing program. 


The hashed page table is a variable-sized data structure that defines the mapping between 
virtual page numbers and physical page numbers. The page table size is a power of 2, and 
its starting address is a multiple of its size. The page table contains a number of page table 
entry groups (PTEGs). A PTEG contains eight page table entries (PTEs) of eight bytes 
each; therefore, each PTEG is 64 bytes long. PTEG addresses are entry points for table 
search operations. 


Setting MSR[IR] enables instruction address translations and MSR[DR] enables data 
address translations. If the bit is cleared, the respective effective address is the same as the 
physical address. 


1.8.2 MPC750 Microprocessor Memory Management 
Implementation 


The MPC750 implements separate MMUs for instructions and data. It implements a copy 
of the segment registers in the instruction MMU, however, read and write accesses (mfsr 
and mtsr) are handled through the segment registers implemented as part of the data MMU. 
The MPC750 MMU is described in Section 1.2.3, “Memory Management Units (MMUs).” 
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The R (referenced) bit is updated in the PTE in memory (if necessary) during a table search 
due to a TLB miss. Updates to the C (changed) bit are treated like TLB misses. A complete 
table search is performed and the entire TLB entry is rewritten to update the C bit. 


1.9 Instruction Timing 


The MPC750 is a pipelined, superscalar processor. A pipelined processor is one in which 
instruction processing is divided into discrete stages, allowing work to be done on different 
instructions in each stage. For example, after an instruction completes one stage, it can pass 
on to the next stage leaving the previous stage available to the subsequent instruction. This 
improves overall instruction throughput. 


A superscalar processor is one that issues multiple independent instructions into separate 
execution units, allowing instructions to execute in parallel. The MPC750 has six 
independent execution units, two for integer instructions, and one each for floating-point 
instructions, branch instructions, load/store instructions, and system register instructions. 
Having separate GPRs and FPRs allows integer, floating-point calculations, and load and 
store operations to occur simultaneously without interference. Additionally, rename buffers 
are provided to allow operations to post execution results for use by subsequent instructions 
without committing them to the architected FPRs and GPRs. 


As shown in Figure 1-6, the common pipeline of the MPC750 has four stages through 
which all instructions must pass—fetch, decode/dispatch, execute, and complete/write 
back. Some instructions occupy multiple stages simultaneously and some individual 
execution units have additional stages. For example, the floating-point pipeline consists of 
three stages through which all floating-point instructions must pass. 
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Maximum four-instruction fetch 
per clock cycle 










Maximum three-instruction dispatch 
per clock cycle (includes one branch 
instruction) 


Tm a a a FR a Fa a a Nk ee ce NN a Nap pe SE ale cee Pa 


Execute Stage 


Complete (Write-Back) 


Figure 1-6. Pipeline Diagram 






Maximum two-instruction com- 
pletion per clock cycle 


Note that Figure 1-6 does not show features, such as reservation stations and rename buffers 
that reduce stalls and improve instruction throughput. 


The instruction pipeline in the MPC750 has four major pipeline stages, described as 
follows: 


The fetch pipeline stage primarily involves retrieving instructions from the memory 
system and determining the location of the next instruction fetch. The BPU decodes 
branches during the fetch stage and removes those that do not update CTR or LR 
from the instruction stream. 


The dispatch stage is responsible for decoding the instructions supplied by the 
instruction fetch stage and determining which instructions can be dispatched in the 
current cycle. If source operands for the instruction are available, they are read from 
the appropriate register file or rename register to the execute pipeline stage. If a 
source operand is not available, dispatch provides a tag that indicates which rename 
register will supply the operand when it becomes available. At the end of the 
dispatch stage, the dispatched instructions and their operands are latched by the 
appropriate execution unit. 


Instructions executed by the IUs, FPU, SRU, and LSU are dispatched from the 
bottom two positions in the instruction queue. In a single clock cycle, a maximum 
of two instructions can be dispatched to these execution units in any combination. 
When an instruction is dispatched, it is assigned a position in the six-entry 
completion queue. A branch instruction can be issued on the same clock cycle for a 
maximum three-instruction dispatch. 
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During the execute pipeline stage, each execution unit that has an executable 
instruction executes the selected instruction (perhaps over multiple cycles), writes 
the instruction's result into the appropriate rename register, and notifies the 
completion stage that the instruction has finished execution. In the case of an internal 
exception, the execution unit reports the exception to the completion pipeline stage 
and (except for the FPU) discontinues instruction execution until the exception is 
handled. The exception is not signaled until that instruction is the next to be 
completed. Execution of most floating-point instructions is pipelined within the FPU 
allowing up to three instructions to be executing in the FPU concurrently. The FPU 
stages are multiply, add, and round-convert. Execution of most load/store 
instructions is also pipelined. The load/store unit has two pipeline stages. The first 
stage is for effective address calculation and MMU translation and the second stage 
is for accessing the data in the cache. 


The complete pipeline stage maintains the correct architectural machine state and 
transfers execution results from the rename registers to the GPRs and FPRs (and 
CTR and LR, for some instructions) as instructions are retired. As with dispatching 
instructions from the instruction queue, instructions are retired from the two bottom 
positions in the completion queue. If completion logic detects an instruction causing 
an exception, all following instructions are cancelled, their execution results in 
rename registers are discarded, and instructions are fetched from the appropriate 
exception vector. 


Because the PowerPC architecture can be applied to such a wide variety of 
implementations, instruction timing varies among processors of this family. 


For a detailed discussion of instruction timing with examples and a table of latencies for 
each execution unit, see Chapter 6, “Instruction Timing.” 


1.10 Power Management 


The MPC750 provides four power modes, selectable by setting the appropriate control bits 
in the MSR and HID registers. The four power modes are as follows: 


Full-power—tThis is the default power state of the MPC750. The MPC750 is fully 
powered and the internal functional units are operating at the full processor clock 
speed. If the dynamic power management mode is enabled, functional units that are 
idle will automatically enter a low-power state without affecting performance, 
software execution, or external hardware. 


Doze—All the functional units of the MPC750 are disabled except for the time 
base/decrementer registers and the bus snooping logic. When the processor is in 
doze mode, an external asynchronous interrupt, a system management interrupt, a 
decrementer exception, a hard or soft reset, or machine check brings the MPC750 
into the full-power state. The MPC750 in doze mode maintains the PLL in a fully 
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powered state and locked to the system external clock input (SYSCLK) so a 
transition to the full-power state takes only a few processor clock cycles. 


¢ Nap—tThe nap mode further reduces power consumption by disabling bus snooping, 
leaving only the time base register and the PLL in a powered state. The MPC750 
returns to the full-power state upon receipt of an external asynchronous interrupt, a 
system management interrupt, a decrementer exception, a hard or soft reset, or a 
machine check input (MCP). A return to full-power state from a nap state takes only 
a few processor clock cycles. When the processor is in nap mode, if QACK is 
negated, the processor is put in doze mode to support snooping. 








¢ Sleep—Sleep mode minimizes power consumption by disabling all internal 
functional units, after which external system logic may disable the PLL and 
SYSCLK. Returning the MPC750 to the full-power state requires the enabling of the 
PLL and SYSCLK, followed by the assertion of an external asynchronous interrupt, 
a system management interrupt, a hard or soft reset, or a machine check input (MCP) 
signal after the time required to relock the PLL. 


Chapter 10, “Power and Thermal Management,” provides information about power saving 
and thermal management modes for the MPC750. 


1.11 Thermal Management 


The MPC750’s thermal assist unit (TAU) provides a way to control heat dissipation. This 
ability is particularly useful in portable computers, which, due to power consumption and 
size limitations, cannot use desktop cooling solutions such as fans. Therefore, better heat 
sink designs coupled with intelligent thermal management is of critical importance for high 
performance portable systems. 


Primarily, the thermal management system monitors and regulates the system’s operating 
temperature. For example, if the temperature is about to exceed a set limit, the system can 
be made to slow down or even suspend operations temporarily in order to lower the 
temperature. 


The thermal management facility also ensures that the processor’s junction temperature 
does not exceed the operating specification. To avoid the inaccuracies that arise from 
measuring junction temperature with an external thermal sensor, the MPC750’s on-chip 
thermal sensor and logic tightly couples the thermal management implementation. 


The TAU consists of a thermal sensor, digital-to-analog convertor, comparator, control 
logic, and the dedicated SPRs described in Section 1.4, “PowerPC Registers and 
Programming Model.” The TAU does the following: 


¢ Compares the junction temperature against user-programmable thresholds 
¢ Generates a thermal management interrupt if the temperature crosses the threshold 


e Enables the user to estimate the junction temperature by way of a software 
successive approximation routine 
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The TAU is controlled through the privileged mtspr/mfspr instructions to the three SPRs 
provided for configuring and controlling the sensor control logic, which function as 
follows: 


¢ THRM1 and THRM? provide the ability to compare the junction temperature 
against two user-provided thresholds. Having dual thresholds gives the thermal 
management software finer control of the junction temperature. In single threshold 
mode, the thermal sensor output is compared to only one threshold in either THRM1 
or THRM2. 


¢ THRMs3 is used to enable the TAU and to control the comparator output sample 
time. The thermal management logic manages the thermal management interrupt 
generation and time multiplexed comparisons in the dual threshold mode as well as 
other control functions. 


Instruction cache throttling provides control of the MPC750’s overall junction temperature 
by determining the interval at which instructions are fetched. This feature is accessed 
through the ICTC register. 


Chapter 10, “Power and Thermal Management,” provides information about power saving 
and thermal management modes for the MPC750. 


1.12 Performance Monitor 


The MPC750 incorporates a performance monitor facility that system designers can use to 
help bring up, debug, and optimize software performance. The performance monitor counts 
events during execution of code, relating to dispatch, execution, completion, and memory 
accesses. 


The performance monitor incorporates several registers that can be read and written to by 
supervisor-level software. User-level versions of these registers provide read-only access 
for user-level applications. These registers are described in Section 1.4, “PowerPC 
Registers and Programming Model.” Performance monitor control registers, MMCRO or 
MMCRI1, can be used to specify which events are to be counted and the conditions for 
which a performance monitoring interrupt is taken. Additionally, the sampled instruction 
address register, SIA (USIA), holds the address of the first instruction to complete after the 
counter overflowed. 


Attempting to write to a user-read-only performance monitor register causes a program 
exception, regardless of the MSR[PR] setting. 


When a performance monitoring interrupt occurs, program execution continues from 
vector offset OxOOFOO. 


Chapter 11, “Performance Monitor,” describes the operation of the performance monitor 
diagnostic tool incorporated in the MPC750. 
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Chapter 2 
Programming Model 


This chapter describes the MPC750 programming model, emphasizing those features 
specific to the MPC750 processor and summarizing those that are common to the 
processors that implement the PowerPC architecture. It consists of three major sections, 
which describe the following: 


¢ Registers implemented in the MPC750 
¢ Operand conventions 
¢ The MPC750 instruction set 


For detailed information about architecture-defined features, see the Programming 
Environments Manual. 


Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions 
for the MPC750 apply for the MPC755 except as noted in Appendix C, “MPC755 
Embedded G3 Microprocessor.” 


2.1 The MPC750 Processor Register Set 


This section describes the registers implemented in the MPC750. It includes an overview 
of registers defined by the PowerPC architecture, highlighting differences in how these 
registers are implemented in the MPC750, and a detailed description of MPC750-specific 
registers. Full descriptions of the architecture-defined register set are provided in Chapter 2, 
“PowerPC Register Set,” in the Programming Environments Manual. 


Registers are defined at all three levels of the PowerPC architecture—user instruction set 
architecture (UISA), virtual environment architecture (VEA), and operating environment 
architecture (OEA). The PowerPC architecture defines register-to-register operations for all 
computational instructions. Source data for these instructions are accessed from the on-chip 
registers or are provided as immediate values embedded in the opcode. The three-register 
instruction format allows specification of a target register distinct from the two source 
registers, thus preserving the original data for use by other instructions and reducing the 
number of instructions required for certain operations. Data is transferred between memory 
and registers with explicit load and store instructions only. 
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2.1.1 Register Set 


The PowerPC UISA registers are user-level. General-purpose registers (GPRs) and 
floating-point registers (FPRs) are accessed through instruction operands. Access to 
registers can be explicit (by using instructions for that purpose such as Move to 
Special-Purpose Register (mtspr) and Move from Special-Purpose Register (mfspr) 
instructions) or implicit as part of the execution of an instruction. Some registers are 
accessed both explicitly and implicitly. 


The registers implemented on the MPC750 are shown in Figure 2-1. The number to the 
right of the special-purpose registers (SPRs) indicates the number that is used in the syntax 
of the instruction operands to access the register (for example, the number used to access 
the integer exception register (XER) is SPR 1). These registers can be accessed using the 
mtspr and mfspr instructions. 
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SUPERVISOR MODEL—OEA 


Configuration Registers 


USER MODEL—VEA Hardware Processor 
Implementation Version Machine State 
Time Base Facility (For Reading) Registers! Register Register 


TBL TBR 268 TBU TBR 269 HIDO SPR 1008 PVR SPR 287 MSR 
HID1 SPR 1009 

































































USER MODEL—UISA Memory Management Registers 


Count General-Purpose Instruction BAT Data BAT Segment 
Register Registers Registers Registers Registers 


CTR GPRO IBATOU SPR 528 DBATOU | SPR 536 SRO 
XER GPRI IBATOL SPR 529 DBATOL | SPR 537 SR1 
XER . IBAT1U SPR 530 DBATIU | SPR 538 ° 

* IBATIL SPR 531 DBATIL | SPR 539 S: 
GPR31 IBAT2U SPR 532 DBAT2U __| SPR 540 

? : IBAT2L SPR 533 DBAT2L | SPR 541 
Floating-Point 
Performance Registers IBAT3U SPR 534 DBAT3U SPR 542 


Monitor Registers FPRO IBAT3L | SPR 535 DBAT3L | SPR 543 
(For Reading) 
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Sampled Instruction CR 
Address Miscellaneous Registers 


USIA SPR 939 Floating-Point External Access Time Base Decrementer 
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Performance Monitor Registers 
parkoenanke Sampled Power/Thermal Management Registers 
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1 These registers are MPC750-speci c registers . They may not be supported by other processors that implement the 
PowerPC architecture. 


2 Not supported by the MPC740. 


Figure 2-1. Programming Model—MPC750 Microprocessor Registers 
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Implementation Note—The MPC750 fully decodes the SPR field of the instruction. If the 
SPR specified is undefined, the illegal instruction program exception occurs. The user-level 
registers are described as follows: 


User-level registers (UISA)—The user-level registers can be accessed by all 
software with either user or supervisor privileges. They include the following: 


— General-purpose registers (GPRs). The thirty-two GPRs (GPRO—GPR31) serve 


as data source or destination registers for integer instructions and provide data 
for generating addresses. See “General Purpose Registers (GPRs),” in Chapter 2, 
“PowerPC Register Set,” of the Programming Environments Manual for more 
information. 


Floating-point registers (FPRs). The thirty-two FPRs (FPRO-FPR31) serve as 
the data source or destination for all floating-point instructions. See 
“Floating-Point Registers (FPRs),” in Chapter 2, “PowerPC Register Set,” of the 
Programming Environments Manual. 


Condition register (CR). The 32-bit CR consists of eight 4-bit fields, CRO—-CR7, 
that reflect results of certain arithmetic operations and provide a mechanism for 
testing and branching. See “Condition Register (CR), in Chapter 2, “PowerPC 
Register Set,” of the Programming Environments Manual. 


Floating-point status and control register (FPSCR). The FPSCR contains all 
floating-point exception signal bits, exception summary bits, exception enable 
bits, and rounding control bits needed for compliance with the IEEE 754 
standard. See “Floating-Point Status and Control Register (FPSCR),” in 
Chapter 2, “PowerPC Register Set,” of the Programming Environments Manual. 


The remaining user-level registers are SPRs. Note that the PowerPC architecture 
provides a separate mechanism for accessing SPRs (the mtspr and mfspr 
instructions). These instructions are commonly used to explicitly access certain 
registers, while other SPRs may be more typically accessed as the side effect of 
executing other instructions. 


— Integer exception register (XER). The XER indicates overflow and carries for 


integer operations. See “XER Register (XER),” in Chapter 2, “PowerPC Register 
Set,’ of the Programming Environments Manual for more information. 


Implementation Note—To allow emulation of the Iscbx instruction defined by 
the POWER architecture, XER[16—23] is implemented so that they can be read 
with mfspr[XER] and written with mtxer[XER] instructions. 


— Link register (LR). The LR provides the branch target address for the Branch 


Conditional to Link Register (belrx) instruction, and can be used to hold the 
logical address of the instruction that follows a branch and link instruction, 
typically used for linking to subroutines. See “Link Register (LR),” in Chapter 2, 
“PowerPC Register Set,” of the Programming Environments Manual. 


— Count register (CTR). The CTR holds a loop count that can be decremented 


during execution of appropriately coded branch instructions. The CTR can also 
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provide the branch target address for the Branch Conditional to Count Register 
(bectrx) instruction. See “Count Register (CTR),” in Chapter 2, “PowerPC 
Register Set,” of the Programming Environments Manual. 


User-level registers (VEA)—The PowerPC VEA defines the time base facility 
(TB), which consists of two 32-bit registers—time base upper (TBU) and time base 
lower (TBL). The time base registers can be written to only by supervisor-level 
instructions but can be read by both user- and supervisor-level software. For more 
information, see “PowerPC VEA Register Set—Time Base,” in Chapter 2, 
“PowerPC Register Set,” of the Programming Environments Manual. 


Supervisor-level registers (QEA)—The OEFA defines the registers an operating 
system uses for memory management, configuration, exception handling, and other 
operating system functions. The OEA defines the following supervisor-level 
registers for 32-bit implementations: 


— Configuration registers 


— Machine state register (MSR). The MSR defines the state of the processor. 
The MSR can be modified by the Move to Machine State Register (mtmsr), 
System Call (sc), and Return from Exception (rfi) instructions. It can be read 
by the Move from Machine State Register (mfmsr) instruction. When an 
exception is taken, the contents of the MSR are saved to the machine status 
save/restore register 1 (SRR1), which is described below. See “Machine State 
Register (MSR),” in Chapter 2, “PowerPC Register Set,” of the Programming 
Environments Manual for more information. 

Implementation Note—Table 2-1 describes MSR bits the MPC750 
implements that are not required by the PowerPC architecture. 


Table 2-1. Additional MSR Bits 











Bit | Name Description 
13 | POW Power management enable. Optional to the PowerPC architecture. 
0 Power management is disabled. 
1 Power management is enabled. The processor can enter a power-saving mode when additional 
conditions are present. The mode chosen is determined by the DOZE, NAP, and SLEEP bits in 
the hardware implementation-dependent register 0 (HIDO), described in Table 2-4. 
29 |PM Performance monitor marked mode. This bit is speci c to the MPC750, and is de ned as reser ved 





by the PowerPC architecture. See Chapter 11, “Performance Monitor.” 
0 Process is not a marked process. 
1 Process is a marked process. 








Note that setting MSR[EE] masks not only the architecture-defined external 
interrupt and decrementer exceptions but also the MPC750-specific system 
management, performance monitor, and thermal management exceptions. 

— Processor version register (PVR). This register is a read-only register that 
identifies the version (model) and revision level of the processor. For more 
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information, see “Processor Version Register (PVR),” in Chapter 2, 
“PowerPC Register Set,’ of the Programming Environments Manual. 


Implementation Note—The processor version number is 0x0008 for the 
MPC750. The processor revision level starts at 0x0100 and is updated for each 
silicon revision. 


— Memory management registers 


— Block-address translation (BAT) registers. The PowerPC OEA includes an 
array of block address translation registers that can be used to specify four 
blocks of instruction space and four blocks of data space. The BAT registers 
are implemented in pairs—four pairs of instruction BATs IBATOU-IBAT3U 
and IBATOL-IBAT3L) and four pairs of data BATs (DBATOU—DBAT3U and 
DBATOL-—DBAT3L). Figure 2-1 lists the SPR numbers for the BAT registers. 
For more information, see “BAT Registers,” in Chapter 2, “PowerPC Register 
Set,’ of the Programming Environments Manual. Because BAT upper and 
lower words are loaded separately, software must ensure that BAT translations 
are correct during the time that both BAT entries are being loaded. 


The MPC750 implements the G bit in the IBAT registers; however, attempting 
to execute code from an IBAT area with G = 1| causes an ISI exception. This 
complies with the revision of the architecture described in the Programming 
Environments Manual. 


— SDR1. The SDR1 register specifies the page table base address used in 
virtual-to-physical address translation. See “SDR1,” in Chapter 2, “PowerPC 
Register Set,” of the Programming Environments Manual.” 


— Segment registers (SR). The PowerPC OEA defines sixteen 32-bit segment 
registers (SRO-SR15). Note that the SRs are implemented on 32-bit 
implementations only. The fields in the segment register are interpreted 
differently depending on the value of bit 0. See “Segment Registers,” in 
Chapter 2, “PowerPC Register Set,’ of the Programming Environments 
Manual for more information. 

Note that the MPC750 implements separate memory management units 
(MMUs) for instruction and data. It associates the architecture-defined SRs 
with the data MMU (DMMU). It reflects the values of the SRs in separate, 
so-called ‘shadow’ segment registers in the instruction MMU (IMMU). 

— Exception-handling registers 


— Data address register (DAR). After a DSI or an alignment exception, DAR is 
set to the effective address (EA) generated by the faulting instruction. See 
“Data Address Register (DAR),” in Chapter 2, “PowerPC Register Set,” of the 
Programming Environments Manual for more information. 


— SPRGO-SPRG3. The SPRGO-SPRG3 registers are provided for operating 
system use. See “SPRGO-SPRG3,” in Chapter 2, “PowerPC Register Set,’ of 
the Programming Environments Manual for more information. 
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— DSISR. The DSISR register defines the cause of DSI and alignment 
exceptions. See “DSISR,” in Chapter 2, “PowerPC Register Set,” of the 
Programming Environments Manual for more information. 


— Machine status save/restore register 0 (SRRO). The SRRO register is used to 
save the address of the instruction at which execution continues when rfi 
executes at the end of an exception handler routine. See “Machine Status 
Save/Restore Register 0 (SRRO),’ in Chapter 2, “PowerPC Register Set,” of 
the Programming Environments Manual for more information. 


— Machine status save/restore register 1 (SRR1). The SRR1 register is used to 
save machine status on exceptions and to restore machine status when rfi 
executes. See “Machine Status Save/Restore Register 1 (SRR1),” in 
Chapter 2, “PowerPC Register Set,’ of the Programming Environments 
Manual for more information. 


Implementation Note—When a machine check exception occurs, the 
MPC750 sets one or more error bits in SRR1. Table 2-2 describes SRR1 bits 
the MPC750 implements that are not required by the PowerPC architecture. 


Table 2-2. Additional SRR1 Bits 





Bit Name Description 





11 L2DP_ | Set by a data parity error on the L2 bus. The MPC740 does not implement the L2 cache interface. 





12 MCPIN | Set by the assertion of MCP 





13 TEA | Set by a TEA assertion on the 60x bus 





14 DP Set by a data parity error on the 60x bus 





15 AP Set by an address parity error on the 60x bus 














— Miscellaneous registers 


— Time base (TB). The TB is a 64-bit structure provided for maintaining the 
time of day and operating interval timers. The TB consists of two 32-bit 
registers—time base upper (TBU) and time base lower (TBL). The time base 
registers can be written to only by supervisor-level software, but can be read 
by both user- and supervisor-level software. See “Time Base Facility 
(TB)—OEA,” in Chapter 2, “PowerPC Register Set,’ of the Programming 
Environments Manual for more information. 


— Decrementer register (DEC). This register is a 32-bit decrementing counter 
that provides a mechanism for causing a decrementer exception after a 
programmable delay; the frequency is a subdivision of the processor clock. 
See “Decrementer Register (DEC),” in Chapter 2, “PowerPC Register Set,” of 
the Programming Environments Manual for more information. 


Implementation Note—In the MPC750 the decrementer register is 
decremented and the time base is incremented at a speed that is one-fourth the 
speed of the bus clock. 
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Data address breakpoint register (DABR)—This optional register is used to 
cause a breakpoint exception if a specified data address is encountered. See 
“Data Address Breakpoint Register (DABR),” in Chapter 2, “PowerPC 
Register Set,” of the Programming Environments Manual. 


External access register (EAR). This optional register is used in conjunction 
with eciwx and ecowx. Note that the EAR register and the eciwx and ecowx 
instructions are optional in the PowerPC architecture and may not be 
supported in all processors that implement the OEA. See “External Access 
Register (EAR),” in Chapter 2, “PowerPC Register Set,” of the Programming 
Environments Manual for more information. 


¢ MPC750-specific registers—The PowerPC architecture allows implementation- 
specific SPRs. Those incorporated in the MPC750 are described as follows. Note 
that in the MPC750, these registers are all supervisor-level registers. 


— Instruction address breakpoint register IABR)—This register can be used to 
cause a breakpoint exception if a specified instruction address is encountered. 


— Hardware implementation-dependent register 0 (HIDO)—This register controls 
various functions, such as enabling checkstop conditions, and locking, enabling, 
and invalidating the instruction and data caches. 


— Hardware implementation-dependent register 1 (HID1)—This register reflects 
the state of PLL_CFG[0-3] clock signals. 


— The L2 cache control register (L2CR) is used to configure and operate the L2 
cache. It includes bits for enabling parity checking, setting the L2-to-processor 
clock ratio, and identifying the type of RAM used for the L2 cache 
implementation. (Not supported in the MPC740.) 


— Performance monitor registers. The following registers are used to define and 
count events for use by the performance monitor: 


The performance monitor counter registers (PMC1—PMC4) are used to record 
the number of times a certain event has occurred. UPMC1—UPMC4 provide 
user-level read access to these registers. 


The monitor mode control registers (MMCRO-MMCR1) are used to enable 
various performance monitor interrupt functions. UMMCRO-UMMCRI1 
provide user-level read access to these registers. 


The sampled instruction address register (SIA) contains the effective address 
of an instruction executing at or around the time that the processor signals the 
performance monitor interrupt condition. USIA provides user-level read 
access to the SIA. 


The MPC750 does not implement the sampled data address register (SDA) or 
the user-level, read-only USDA registers. However, for compatibility with 
processors that do, those registers can be written to by boot code without 
causing an exception. SDA is SPR 959; USDA is SPR 943. 
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— The instruction cache throttling control register (ICTC) has bits for enabling the 
instruction cache throttling feature and for controlling the interval at which 
instructions are forwarded to the instruction buffer in the fetch unit. This 
provides control over the processor’s overall junction temperature. 


— Thermal management registers (THRM1, THRM2, and THRM3). Used to 
enable and set thresholds for the thermal management facility. 


— THRM1 and THRM? provide the ability to compare the junction temperature 
against two user-provided thresholds. The dual thresholds allow the thermal 
management software differing degrees of action in lowering the junction 
temperature. The TAU can be also operated in a single threshold mode in 
which the thermal sensor output is compared to only one threshold in either 
THRM1 or THRM2. 


— THRM53 is used to enable the thermal management assist unit (TAU) and to 
control the comparator output sample time. 


Note that while it is not guaranteed that the implementation of MPC750-specific registers 
is consistent among processors of this family, other processors may implement similar or 
identical registers. 


2.1.2 MPC750-Specific Registers 


This section describes registers that are defined for the MPC750 but are not included in the 
PowerPC architecture. 


2.1.2.1. Instruction Address Breakpoint Register (IABR) 


The address breakpoint register (ABR), shown in Figure 2-2, supports the instruction 
address breakpoint exception. When this exception is enabled, instruction fetch addresses 
are compared with an effective address stored in the IABR. If the word specified in the 
IABR is fetched, the instruction breakpoint handler is invoked. The instruction that triggers 
the breakpoint does not execute before the handler is invoked. For more information, see 
Section 4.5.14, “Instruction Address Breakpoint Exception (0x01300).” The IABR can be 
accessed with mtspr and mfspr using the SPR1010. 


0 29 30 31 


Figure 2-2. Instruction Address Breakpoint Register 


The IABR bits are described in Table 2-3. 
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Table 2-3. Instruction Address Breakpoint Register Bit Settings 





Bits | Name Description 
0-29 | Address | Word address to be compared 

30 BE Breakpoint enabled. Setting this bit indicates that breakpoint checking is to be done. 
31 TE Translation enabled. An [ABR match is signaled if this bit matches MSR[IR]. 














2.1.2.2 Hardware Implementation-Dependent Register 0 


The hardware implementation-dependent register 0 (HIDO) controls the state of several 
functions within the MPC750. The HIDO register is shown in Figure 2-3. 











DLOCK Reserved 
EMCP BCLK ECLK DOZE SLEEP ILOCK NOOPTI 








ESSER ECECES 6 CER Go 


10 11° 12 17. 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


enue 2-3. Hardware raplewrentsiiebenendent Register 0 (HIDO) 


The HIDO bits are described in Table 2-4. 
Table 2-4. HIDO Bit Functions 





Bits Name Function 





0 EMCP | Enable MCP. The primary purpose of this bit is to mask out further machine check exceptions caused 
by assertion of MCP similar to how MSR[EE] can mask external interrupts. 

0 Masks MCP Asserting MCP does not generate a machine check exception or a checkstop. 

1 Asserting MCP causes checkstop if MSR[ME] = 0 or a machine check exception if ME = 1. 





1 DBP | Disable 60x bus address and data parity generation. 

0 The system generates address and data parity. 

1 Parity generation is disabled and parity signals are driven to 0 during bus operations. When parity 
generation is disabled, all parity checking should also be disabled and parity signals need not be 
connected. 





2 EBA | Enable/disable 60x bus address parity checking 

0 Prevents address parity checking. 

1 Allows a address parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception 
if MSR[ME] = 1. 

EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. 





3 EBD | Enable 60x bus data parity checking 

0 Parity checking is disabled. 

1 Allows a data parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if 
MSR[ME] = 1. 

EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. 





4 BCLK |CLK_OUT output enable and clock type selection. Used in conjunction with HIDO[ECLK] and the 
HRESET signal to con gure CLK_OUT. See Table 2-5. 











5 _— Not used. De ned as EICE on some ear lier processors. 
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Table 2-4. HIDO Bit Functions (continued) 
































Bits | Name Function 
6 ECLK |CLK_OUT output enable and clock type selection. Used in conjunction with HIDO[BCLK] and the 
HRESET signal to con gure CLK_OUT. See Table 2-5. 
7 PAR _ | Disable precharge of ARTRY. 

0 Precharge of ARTRY enabled 

1 Alters bus protocol slightly by preventing the processor from driving ARTRY to high (negated) 
state. If this is done, the system must restore the signals to the high state. 

8 DOZE | Doze mode enable. Operates in conjunction with MSR[POW]. 

0 Doze mode disabled. 

1 Doze mode enabled. Doze mode is invoked by setting MSR[POW] while this bit is set. In doze 
mode, the PLL, time base, and snooping remain active. 

9 NAP | Nap mode enable. Operates in conjunction with MSR[POW]. 

0 Nap mode disabled. 

1 Nap mode enabled. Doze mode is invoked by setting MSR[POW] while this bit is set. In nap mode, 
the PLL and the time base remain active. 

10 SLEEP | Sleep mode enable. Operates in conjunction with MSR[POW]. 

0 Sleep mode disabled. 

1 Sleep mode enabled. Sleep mode is invoked by setting MSR[POW] while this bit is set. QREQ is 
asserted to indicate that the processor is ready to enter sleep mode. If the system logic 
determines that the processor may enter sleep mode, the quiesce acknowledge signal, QACK, is 
asserted back to the processor. Once QACK assertion is detected, the processor enters sleep 
mode after several processor clocks. At this point, the system logic may turn off the PLL by rst 
con gur ing PLL_CFG[0—3] to PLL bypass mode, then disabling SYSCLK. 

11 DPM _| Dynamic power management enable. 

0 Dynamic power management is disabled. 

1 Functional units enter a low-power mode automatically if the unit is idle. This does not affect 
operational performance and is transparent to software or any external hardware. 

12-14 — Not used 
15 NHR_ | Not hard reset (software-use only)—Helps software distinguish a hard reset from a soft reset. 

0 Ahard reset occurred if software had previously set this bit. 

1 Ahard reset has not occurred. If software sets this bit after a hard reset, when a reset occurs and 
this bit remains set, software can tell it was a soft reset. 

16 ICE Instruction cache enable 

0 The instruction cache is neither accessed nor updated. All pages are accessed as if they were 
marked cache-inhibited (WIM = xIx). Potential cache accesses from the bus (snoop and cache 
operations) are ignored. In the disabled state for the L1 caches, the cache tag state bits are 
ignored and all accesses are propagated to the L2 cache or bus as single-beat transactions. For 
those transactions, however, Cl re ects the or iginal state determined by address translation 
regardless of cache disabled status. ICE is zero at power-up. 

1 The instruction cache is enabled 

17 DCE _ | Data cache enable 








0 The data cache is neither accessed nor updated. All pages are accessed as if they were marked 
cache-inhibited (WIM = xIx). Potential cache accesses from the bus (snoop and cache operations) 
are ignored. In the disabled state for the L1 caches, the cache tag state bits are ignored and all 
accesses are propagated to the L2 cache or bus as single-beat transactions. For those 
transactions, however, Cl re ects the or iginal state determined by address translation regardless 
of cache disabled status. DCE is zero at power-up. 

1 The data cache is enabled. 
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Bits 


Name 


Table 2-4. HIDO Bit Functions (continued) 


Function 





18 


ILOCK 


Instruction cache lock 

0 Normal operation 

1 Instruction cache is locked. A locked cache supplies data normally on a hit, but are treated as a 
cache-inhibited transaction on a miss. On a miss, the transaction to the bus or the L2 cache is 
single-beat, however, CI still re ects the or iginal state as determined by address translation 
independent of cache locked or disabled status. 

To prevent locking during a cache access, an isync instruction must precede the setting of ILOCK. 





19 


DLOCK 


Data cache lock. 

0 Normal operation 

1 Data cache is locked. A locked cache supplies data normally on a hit but is treated as a 
cache-inhibited transaction on a miss. On a miss, the transaction to the bus or the L2 cache is 
single-beat, however, CI still re ects the or iginal state as determined by address translation 
independent of cache locked or disabled status. A snoop hit to a locked L1 data cache performs 
as if the cache were not locked. A cache block invalidated by a snoop remains invalid until the 
cache is unlocked. 

To prevent locking during a cache access, a sync instruction must precede the setting of DLOCK. 
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ICFI 


Instruction cache ash in validate 

0 The instruction cache is not invalidated. The bit is cleared when the invalidation operation begins 
(usually the next cycle after the write operation to the register). The instruction cache must be 
enabled for the invalidation to occur. 

1 An invalidate operation is issued that marks the state of each instruction cache block as invalid 
without writing back modi ed cache b locks to memory. Cache access is blocked during this time. 
Bus accesses to the cache are signaled as a miss during invalidate-all operations. Setting ICFI 
clears all the valid bits of the blocks and the PLRU bits to point to way LO of each set. Once the 
L1 ash in validate bits are set through a mtspr operations, hardware automatically resets these 
bits in the next cycle (provided that the corresponding cache enable bits are set in HIDO). 

Note that in the MPC603e processors, the proper use of the ICFl and DCFI bits was to set them and 

clear them in two consecutive mtspr operations. Software that already has this sequence of 

operations does not need to be changed to run on the MPC750. 





21 


DCFI 


Data cache ash in validate 

0 The data cache is not invalidated. The bit is cleared when the invalidation operation begins 
(usually the next cycle after the write operation to the register). The data cache must be enabled 
for the invalidation to occur. 

1 An invalidate operation is issued that marks the state of each data cache block as invalid without 
writing back modi ed cache b locks to memory. Cache access is blocked during this time. Bus 
accesses to the cache are signaled as a miss during invalidate-all operations. Setting DCFI clears 
all the valid bits of the blocks and the PLRU bits to point to way LO of each set. Once the L1 ash 
invalidate bits are set through a mtspr operations, hardware automatically resets these bits in the 
next cycle (provided that the corresponding cache enable bits are set in HIDO). 

Setting this bit clears all the valid bits of the blocks and the PLRU bits to point to way LO of each set. 

Note that in the MPC603e processors, the proper use of the ICFl and DCFI bits was to set them and 

clear them in two consecutive mtspr operations. Software that already has this sequence of 

operations does not need to be changed to run on the MPC750. 
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SPD 


Speculative cache access disable 

0 Speculative bus accesses to nonguarded space (G = 0) from both the instruction and data caches 
is enabled 

1 Speculative bus accesses to nonguarded space in both caches is disabled 








23 





IFEM 





Enable M bit on bus for instruction fetches. 

0 M bit not re ected on b us for instruction fetches. Instruction fetches are treated as nonglobal on 
the bus 

1 Instruction fetches re ect the M bit from the WIM settings. 
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Table 2-4. HIDO Bit Functions (continued) 





























Bits | Name Function 
24 SGE_ | Store gathering enable 

0 Store gathering is disabled 

1 Integer store gathering is performed for write-through to nonguarded space or for cache-inhibited 
stores to nonguarded space for 4-byte, word-aligned stores. The LSU combines stores to forma 
double word that is sent out on the 60x bus as a single-beat operation. Stores are gathered only 
if successive, eligible stores, are queued and pending. Store gathering is performed regardless 
of address order or endian mode. 

25 DCFA |Datacache ush assist. (Force data cache to ignore invalid sets on miss replacement selection.) 

0 The data cache ush assist facility is disabled 

1 The miss replacement algorithm ignores invalid entries and follows the replacement sequence 
de ned b y the PLRU bits. This reduces the series of uniquely addressed load or debz instructions 
to eight per set. The bit should be set just before beginning a cache ush routine and should be 
cleared when the series of instructions is complete. 

26 BTIC |BTIC enable. Used to enable use of the 64-entry branch instruction cache. 

0 The BTIC contents are invalidated and the BTIC behaves as if it were empty. New entries cannot 
be added until the BTIC is enabled. 

1 The BTIC is enabled and new entries can be added. 

27 — Not used. De ned as FBIOB on ear lier 603-type processors. 
28 ABE __ | Address broadcast enable—controls whether certain address-only operations (such as cache 
operations, eieio, and sync) are broadcast on the 60x bus. 

0 Address-only operations affect only local L1 and L2 caches and are not broadcast. 

1 Address-only operations are broadcast on the 60x bus.Affected instructions are eieio, sync, 
dcbi, dcbf, and dcbst. A sync instruction completes only after a successful broadcast. Execution 
of eieio causes a broadcast that may be used to prevent any external devices, such as a bus 
bridge chip, from store gathering. 

Note that dcbz (with M = 1, coherency required) always broadcasts on the 60x bus regardless of the 

setting of this bit. An icbi is never broadcast. No cache operations, except dcbz, are snooped by the 

MPC750 regardless of whether ABE is set. Bus activity caused by these instructions results directly 

from performing the operation on the MPC750 cache. 

29 BHT | Branch history table enable 

0 BHT disabled. The MPC750 uses static branch prediction as de ned b y the PowerPC architecture 
(UISA) for those branch instructions the BHT would have otherwise used to predict (that is, those 
that use the CR as the only mechanism to determine direction). For more information on static 
branch prediction, see “Conditional Branch Control,” in Chapter 4 of The Programming 
Environments Manual. 

1 Allows the use of the 512-entry branch history table (BHT). 

The BHT is disabled at power-on reset. All entries are set to weakly, not-taken. 

30 — Not used 
31 |NOOPTI | No-op the data cache touch instructions. 








0 The debt and debtst instructions are enabled. 
1 The debt and debtst instructions are no-oped globally. 





Table 2-5 shows how HIDO[BCLK], HIDO[ECLK], and HRESET are used to configure 
CLK_OUT. See Section 7.2.11.2, “Clock Out (CLK_OUT)—Output,” for more 
information. 
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Table 2-5. HIDO[BCLK] and HIDO[ECLK] CLK_OUT Configuration 





HRESET | HIDO[ECLK] HIDO[BCLK] CLK_OUT 
Asserted Xx Xx Bus 

Negated 0 0 High impedance 
Negated 0 1 Bus/ 2 

Negated 1 0 Core 

Negated | 1 Bus 

















HIDO can be accessed with mtspr and mfspr using SPR1008. 


2.1.2.3. Hardware Implementation-Dependent Register 1 


The hardware implementation-dependent register 1 (HID1) reflects the state of the 
PLL_CFG[0-3] signals. The HID1 bits are shown in Figure 2-4. 





Reserved 


Pcwpcipcapcy0 0 00000000000000000000000000 


012 3 4 31 
Figure 2-4. Hardware Implementation-Dependent Register 1 (HID1) 











The HID 1 bits are described in Table 2-6. 
Table 2-6. HID1 Bit Functions 




















Bit(s) Name Description 
0 PCO PLL con gur ation bit 0 (read-only) 
1 PC1 PLL con gur ation bit 1 (read-only) 
2 PC2 PLL con gur ation bit 2 (read-only) 
3 PC3 PLL con gur ation bit 3 (read-only) 
4-31 — Reserved 

















Note: The clock con gur ation bits re ect the state of the PLL_CFG[0—3] signals . 


HID1 can be accessed with mtspr and mfspr using SPR 1009. 


2.1.2.4 Performance Monitor Registers 


This section describes the registers used by the performance monitor, which is described in 
Chapter 11, “Performance Monitor.” 
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2.1.2.4.1.| Monitor Mode Control Register 0 (MMCRO) 


The monitor mode control register 0 (MMCRO), shown in Figure 2-5, is a 32-bit SPR 
provided to specify events to be counted and recorded. The MMCRO can be accessed only 
in supervisor mode. User-level software can read the contents of MMCRO by issuing an 
mfspr instruction to UMMCR0O, described in Section 2.1.2.4.2, “User Monitor Mode 
Control Register 0 (UMMCRO).” 
INTONBITTRANS 
RTCSELECT 


DISCOUNT 
ENINT 


SSCS A ae puCaSELEC 


15 16 17 18 19 25 26 


eae 2-5. sate Mode Control Register 0 (MMCRO) 







PMC2INTCONTROL 
PMC1INTCONTROL 






PMCTRIGGER 





This register must be cleared at power up. Reading this register does not change its 
contents. The bits of the MMCR0O register are described in Table 2-7. 


Table 2-7. MMCRO Bit Settings 





Bits Name Description 





0 DIS Disables counting unconditionally 
0 The values of the PMCn counters can be changed by hardware. 
1 The values of the PMCn counters cannot be changed by hardware. 


1 DP Disables counting while in supervisor mode 

0 The PMCn counters can be changed by hardware. 

1 If the processor is in supervisor mode (MSR[PR] is cleared), the counters are not 
changed by hardware. 





2 DU Disables counting while in user mode 

0 The PMCn counters can be changed by hardware. 

1 If the processor is in user mode (MSR[PR] is set), the PMCn counters are not 
changed by hardware. 


3 DMS Disables counting while MSR[PM] is set 
0 The PMCn counters can be changed by hardware. 
1 If MSR[PM] is set, the PMCn counters are not changed by hardware. 





4 DMR Disables counting while MSR(PM) is zero. 
0 The PMCn counters can be changed by hardware. 
1 If MSR[PM] is cleared, the PMCn counters are not changed by hardware. 








5 ENINT Enables performance monitor interrupt signaling. 

0 Interrupt signaling is disabled. 

1 Interrupt signaling is enabled. 

Cleared by hardware when a performance monitor interrupt is signaled. To reenable 
these interrupt signals, software must set this bit after handling the performance monitor 
interrupt. The IPL ROM code clears this bit before passing control to the operating 
system. 
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Bits 


Table 2-7. MMCRO Bit Settings (continued) 


Name 


Description 








DISCOUNT 


Disables counting of PMCn when a performance monitor interrupt is signaled (that is, 

((PMCnINTCONTROL = 1) & (PMCn[0] = 1) & (ENINT = 1)) or the occurrence of an 

enabled time base transition with ((INTONBITTRANS =1) & (ENINT = 1)). 

0 Signaling a performance monitor interrupt does not affect counting status of PMCn. 

1 The signaling of a performance monitor interrupt prevents changing of PMC1 
counter. The PMCn counter do not change if PMC2COUNTCTL = 0. 

Because a time base signal could have occurred along with an enabled counter 

over o w condition, software should always reset INTONBITTRANS to zero, if the value 

in INTONBITTRANS was a one. 





7-8 


RTCSELECT 


64-bit time base, bit selection enable 
00 Pick bit 63 to count 
01 Pick bit 55 to count 
10 Pick bit 51 to count 
11 Pick bit 47 to count 





INTONBITTRANS 


Cause interrupt signaling on bit transition (identi ed in RTCSELECT) from off to on 
0 Do not allow interrupt signal if chosen bit transitions. 

1 Signal interrupt if chosen bit transitions. 

Software is responsible for setting and clearing INTONBITTRANS. 





10-15 


16 


THRESHOLD 


PMC1INTCONTROL 


Threshold value. The MPC750 supports all 6 bits, allowing threshold values from 0-63. 
The intent of the THRESHOLD support is to characterize L1 data cache misses. 


Enables interrupt signaling due to PMC1 counter over o w. 
0 Disable PMC1 interrupt signaling due to PMC1 counter over o w 
1 Enable PMC1 Interrupt signaling due to PMC1 counter over o w 
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PMCINTCONTROL 


Enable interrupt signaling due to any PMC2—PMC4 counter over o w. Overrides the 
setting of DISCOUNT. 

0 Disable PMC2—PMC4 interrupt signaling due to PMC2—PMC4 counter over o w. 
1 Enable PMC2—PMC4 interrupt signaling due to PMC2—PMC4 counter over o w. 





18 


PMCTRIGGER 


Can be used to trigger counting of PMC2—PMC4 after PMC1 has over o wed or after a 

performance monitor interrupt is signaled. 

0 Enable PMC2—PMC4 counting. 

1 Disable PMC2—PMC4 counting until either PMC1[0] = 1 or a performance monitor 
interrupt is signaled. 





19-25 


PMC1SELECT 


PMC1 input selector, 128 events selectable. See Table 2-10. 





26-31 





PMC2SELECT 





PMC2 input selector, 64 events selectable. See Table 2-11. 





MMCRO can be accessed with mtspr and mfspr using SPR 952. 


2.1.2.4.2 User Monitor Mode Control Register 0 (UMMCRO) 


The contents of MMCRO are reflected to UMMCRO, which can be read by user-level 
software. MMCRO can be accessed with mfspr using SPR 936. 
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2.1.2.4.3. Monitor Mode Control Register 1 (MMCR1) 


The monitor mode control register 1 (MMCR1) functions as an event selector for 
performance monitor counter registers 3 and 4 (PMC3 and PMC4). The MMCRI1 register 
is shown in Figure 2-6. 





Reserved 














Figure 2-6. Monitor Mode Control Register 1 (MMCR1) 


Bit settings for MMCRI1 are shown in Table 2-8. The corresponding events are described 
in Section 2.1.2.4.5, “Performance Monitor Counter Registers (PMC1—PMC4).” 


Table 2-8. MMCR1 Bit Settings 

















Bits Name Description 

0-4 PMC3SELECT |PMC3 input selector. 32 events selectable. See Table 2-12 for de ned selections . 

5-9 PMC4SELECT |PMC4 input selector. 32 events selectable. See Table 2-13 for de ned selections . 
10-31 — Reserved 








MMCRI1 can be accessed with mtspr and mfspr using SPR 956. User-level software can 
read the contents of MMCRI1 by issuing an mfspr instruction to UMMCRI1, described in 
Section 2.1.2.4.4, “User Monitor Mode Control Register 1 (UMMCR1).” 


2.1.2.4.4 User Monitor Mode Control Register 1 (UMMCR1) 


The contents of MMCRI1 are reflected to UMMCRI1, which can be read by user-level 
software. MMCRI can be accessed with mfspr using SPR 940. 


2.1.2.4.5 Performance Monitor Counter Registers (PMC1—PMC4) 


PMC1-—PMC4, shown in Figure 2-7, are 32-bit counters that can be programmed to 
generate interrupt signals when they overflow. 


Counter Value 


0 1 31 
Figure 2-7. Performance Monitor Counter Registers (PMC1—PMC4) 


The bits contained in the PMCn registers are described in Table 2-9. 
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Table 2-9. PMCn Bit Settings 





Bits Name Description 


po | ow | Over o w. When this bit is set it indicates that this counter has reached its maximum value. 





Indicates the number of occurrences of the speci ed e vent. 


Counters are considered to overflow when the high-order bit (the sign bit) becomes set; that 
is, they reach the value 2147483648 (Ox8000_0000). However, an interrupt is not signaled 
unless both PMCn[INTCONTROL] and MMCRO[ENINT] are also set. 


Note that the interrupts can be masked by clearing MSR[EE]; the interrupt signal condition 
may occur with MSR[EE] cleared, but the exception is not taken until EE is set. Setting 
MMCRO[DISCOUNT] forces counters to stop counting when a counter interrupt occurs. 


Software is expected to use mtspr to set PMC explicitly to nonoverflow values. If software 
sets an overflow value, an erroneous exception may occur. For example, if both 
PMCn[INTCONTROL] and MMCRO[ENINT] are set and mtspr loads an overflow value, 
an interrupt signal may be generated without any event counting having taken place. 


The event to be monitored can be chosen by setting MMCRO[0-9]. The selected events are 
counted beginning when MMCR0O is set until either MMCRO is reset or a performance 
monitor interrupt is generated. Table 2-10 lists the selectable events and their encodings. 


Table 2-10. PMC1 Events—MMCRO[19-25] Select Encodings 





Encoding Description 





000 0000 Register holds current value. 





000 0001 Number of processor cycles 





000 0010 Number of completed instructions. Does not include folded branches. 





0000011 Number of transitions from 0 to 1 of speci ed bits in time base lo wer register. Bits are speci ed through 
RTCSELECT (MMRCO[7-8]). 00 = 15, 01 = 19, 10 = 23, 11 = 31 


0000100 Number of instructions dispatched—0O, 1, or 2 instructions per cycle 

0000101 Number of eieio instructions completed 

0000110 Number of cycles spent performing table search operations for the ITLB 

0000111 Number of accesses that hit the L2 

0001000 Number of valid instruction EAs delivered to the memory subsystem 

0001001 Number of times the address of an instruction being completed matches the address in the IABR 
0001010 Number of loads that miss the L1 with latencies that exceeded the threshold value 

0001011 Number of branches that are unresolved when processed 


0001100 Number of cycles the dispatcher stalls due to a second unresolved branch in the instruction stream 











All others Reserved. May be used in a later revision. 


Bits MMCRO[26-31] specify events associated with PMC2, as shown in Table 2-11. 
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Table 2-11. PMC2 Events—MMCRO[26-31] Select Encodings 





















































Encoding Description 

00 0000 Register holds current value. 

00 0001 Counts processor cycles. 

00 0010 Counts completed instructions. Does not include folded branches. 

00 0011 Counts transitions from 0 to 1 of TBL bits speci ed through MMRCO[R TCSELECT]. 00 = 47, 01 = 51, 10 
= 55, 11 = 63. 

00 0100 Counts instructions dispatched. 0, 1, or 2 instructions per cycle. 

00 0101 Counts L1 instruction cache misses. 

00 0110 Counts ITLB misses. 

00 0111 Counts L2 instruction misses. 

00 1000 Counts branches predicted or resolved not taken. 

00 1001 Counts MSR[PR] bit toggles. 

00 1010 Counts times reserved load operations completed. 

00 1011 Counts completed load and store instructions. 

00 1100 Counts snoops to the L1 and the L2. 

00 1101 Counts L1 cast-outs to the L2. 

00 1110 Counts completed system unit instructions. 

00 1111 Counts instruction fetch misses in the L1. 

01 0000 Counts branches allowing out-of-order execution that resolved correctly. 

All others | Reserved. 








Bits MMCRI1[0-4] specify events associated with PMC3, as shown in Table 2-12. 


Table 2-12. PMC3 Events—MMCR1[0-4] Select Encodings 


























Encoding Description 
0 0000 Register holds current value. 
00001 Number of processor cycles 
0 0010 Number of completed instructions, not including folded branches. 
0 0011 Number of transitions from 0 to 1 of speci ed bits in the time base lo wer register. Bits are speci ed 
through RTCSELECT (MMRCO[7-8)}). 0 = 47, 1 = 51, 2 = 55, 3 = 63. 
0 0100 Number of instructions dispatched. 0, 1, or 2 per cycle. 
00101 Number of L1 data cache misses 
00110 Number of DTLB misses 
00111 Number of L2 data misses 
0 1000 Number of taken branches, including predicted branches. 
0 1001 Number of transitions between marked and unmarked processes while in user mode. That is, the 


number of MSR[PM] toggles while the processor is in user mode. 
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Table 2-12. PMC3 Events—MMCR1[0-4] Select Encodings (continued) 



































Encoding Description 

0 1010 Number of store conditional instructions completed 

01011 Number of instructions completed from the FPU 

0 1100 Number of L2 castouts caused by snoops to modi ed lines 

01101 Number of cache operations that hit in the L2 cache 

01110 Reserved 

01111 Number of cycles generated by L1 load misses 

1 0000 Number of branches in the second speculative stream that resolve correctly 

10001 Number of cycles the BPU stalls due to LR or CR unresolved dependencies 
All others | Reserved. May be used in a later revision. 








Bits MMCRI1[5-—9] specify events associated with PMC4, as shown in Table 2-13. 


Table 2-13. PMC4 Events—MMCR1[5-9] Select Encodings 












































Encoding Comments 
00000 Register holds current value 
00001 Number of processor cycles 
00010 Number of completed instructions, not including folded branches 
00011 Number of transitions from 0 to 1 of speci ed bits in the time base lo wer register. Bits are speci ed 
through RTCSELECT (MMRCO[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 =63. 
00100 Number of instructions dispatched. 0, 1, or 2 per cycle. 
00101 Number of L2 castouts 
00110 Number of cycles spent performing tables searches for DTLB accesses 
00111 Reserved. May be used in a later revision. 
01000 Number of mispredicted branches 
01001 Number of transitions between marked and unmarked processes while in user mode. That is, the number 
of MSR[PM] toggles while the processor is in supervisor mode. 
01010 Number of store conditional instructions completed with reservation intact 
01011 Number of completed sync instructions 
01100 Number of snoop request retries 
01101 Number of completed integer operations 
01110 Number of cycles the BPU cannot process new branches due to having two unresolved branches 
All others | Reserved. May be used in a later revision. 











The PMC registers can be accessed with mtspr and mfspr using following SPR numbers: 


¢ PMC1 is SPR 953 
¢ PMC2 is SPR 954 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
The MPC750 Processor Register Set 


¢ PMC3 is SPR 957 
¢ PMC4 is SPR 958 


2.1.2.4.6 User Performance Monitor Counter Registers (UPMC1—UPMC4) 


The contents of the PMC1—PMC4 are reflected to UPMC1—UPMC4, which can be read by 
user-level software. The UPMC registers can be read with mfspr using the following SPR 
numbers: 


¢ UPMCI1 is SPR 937 
¢ UPMC2 is SPR 938 
¢ UPMC3 is SPR 941 
¢ UPMC4 is SPR 942 


2.1.2.4.7_ Sampled Instruction Address Register (SIA) 


The sampled instruction address register (SIA) is a supervisor-level register that contains 
the effective address of an instruction executing at or around the time that the processor 
signals the performance monitor interrupt condition. The SIA is shown in Figure 2-8. 


Instruction Address 


0 31 
Figure 2-8. Sampled instruction Address Registers (SIA) 


If the performance monitor interrupt is triggered by a threshold event, the SIA contains the 
exact instruction (called the sampled instruction) that caused the counter to overflow. 


If the performance monitor interrupt was caused by something besides a threshold event, 
the SIA contains the address of the last instruction completed during that cycle. SIA can be 
accessed with the mtspr and mfspr instructions using SPR 955. 


2.1.2.4.8 User Sampled Instruction Address Register (USIA) 


The contents of SIA are reflected to USIA, which can be read by user-level software. USIA 
can be accessed with the mfspr instructions using SPR 939. 


2.1.2.4.9 Sampled Data Address Register (SDA) and User Sampled Data 
Address Register (USDA) 


The MPC750 does not implement the sampled data address register (SDA) or the 
user-level, read-only USDA registers. However, for compatibility with processors that do, 
those registers can be written to by boot code without causing an exception. SDA is 
SPR 959; USDA is SPR 943. 
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2.1.3 Instruction Cache Throttling Control Register (ICTC) 


Reducing the rate of instruction fetching can control junction temperature without the 
complexity and overhead of dynamic clock control. System software can control 
instruction forwarding by writing a nonzero value to the ICTC register, a supervisor-level 
register shown in Figure 2-9. The overall junction temperature reduction comes from the 
dynamic power management of each functional unit when the MPC750 is idle in between 
instruction fetches. PLL (phase-locked loop) and DLL (delay-locked loop) configurations 
are unchanged. 





Reserved 














Figure 2-9. Instruction Cache Throttling Control Register (ICTC) 


Table 2-14 describes the bit fields for the ICTC register. 
Table 2-14. ICTC Bit Settings 











Bits Name Description 
0-22 —_— Reserved 
23-30 Fl Instruction forwarding interval expressed in processor clocks. 


0x00 0 clock cycle 
0x01 1 clock cycle 


OxFF 255 clock cycles 





31 E Cache throttling enable 
0 Disable instruction cache throttling. 
1 Enable instruction cache throttling. 

















Instruction cache throttling is enabled by setting ICTC[E] and writing the instruction 
forwarding interval into ICTC[FI]. Enabling, disabling, and changing the instruction 
forwarding interval affect instruction forwarding immediately. 


The ICTC register can be accessed with the mtspr and mfspr instructions using SPR 1019. 


2.1.4 Thermal Management Registers (THRM1—THRMS3) 


The on-chip thermal management assist unit provides the following functions: 
¢ Compares the junction temperature against user programmed thresholds 
¢ Generates a thermal management interrupt if the temperature crosses the threshold 


e Provides a way for a successive approximation routine to estimate junction 
temperature 
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Control and access to the thermal management assist unit is through the privileged 
mtspr/mfspr instructions to the three THRM registers. THRM1 and THRM2, shown in 
Figure 2-10, provide the ability to compare the junction temperature against two 
user-provided thresholds. Having dual thresholds allows thermal management software 
differing degrees of action in reducing junction temperature. Thermal management can use 
a single-threshold mode in which the thermal sensor output is compared to only one 
threshold in either THRM1 or THRM2. 





Reserved 











0 |TID/TIE) V 





0 1 2 8 9 28 29 30 31 


Figure 2-10. Thermal Management Registers 1-2 (THRM1—THRM2) 


The bits in THRM1 and THRM2 are described in Table 2-15. 
Table 2-15. THRM1—THRM2 Bit Settings 





Bits Field Description 





0 TIN Thermal management interrupt bit. Read-only. This bit is set if the thermal sensor output crosses 
the threshold speci ed in the SPR. The state of TIN is valid only if TIV is set. The interpretation of 
TIN is controlled by TID. See Table 2-16. 





1 TIV Thermal management interrupt valid. Read-only. This bit is set by the thermal assist logic to indicate 
that the thermal management interrupt (TIN) state is valid. See Table 2-16. 





2-8 | Threshold | Threshold that the thermal sensor output is compared to. The range is 0°—127° C, and each bit 
represents 1° C. Note that this is not the resolution of the thermal sensor. 





9-28 —_ Reserved. System software should clear these bits when writing to the THRMn SPRs. 





29 TID Thermal management interrupt direction bit. Selects the result of the temperature comparison to set 
TIN and to assert a thermal management interrupt if TIE is set. If TID is cleared, TIN is set and an 
interrupt occurs if the junction temperature exceeds the threshold. If TID is set, TIN is set and an 
interrupt is indicated if the junction temperature is below the threshold. See Table 2-16. 





30 TIE Thermal management interrupt enable. The thermal management interrupt is maskable by the 
MSR[EE] bit. If TIE is cleared and THRMn is valid, the TIN bit records the status of the junction 
temperature vs. threshold comparison without causing an exception. This lets system software 
successively approximate the junction temperature. See Table 2-16. 





31 V SPR valid bit. Setting this bit indicates the SPR contains a valid threshold, TID and TIE controls bits. 
THRM1/2[V] = 1 and THRM3{[E] = 1 enables the thermal sensor operation. See Table 2-16. 














If an mtspr affects a THRM register that contains operating parameters for an ongoing 
comparison during operation of the thermal assist unit, the respective TIV bits are cleared 
and the comparison is restarted. Changing THRM3 forces the TIV bits of both THRM1 and 
THRM2 to 0, and restarts the comparison if THRM3[E] is set. 


Examples of valid THRM1/THRM2 bit settings are shown in Table 2-16. 
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Table 2-16. Valid THRM1/THRM2 States 





Tiv' | TID | TIE | V Description 


Invalid entry. The threshold in the SPR is not used for comparison. 


Disable thermal management interrupt assertion. 


Set TIN and assert thermal management interrupt if TIE = 1 and the junction 
temperature exceeds the threshold. 





Set TIN and assert thermal management interrupt if TIE = 1 and the junction 
temperature is less than the threshold. 





The state of the TIN bit is not valid. 





The junction temperature is less than the threshold and as a result the thermal 
management interrupt is not generated for TIE = 1. 





The junction temperature is greater than the threshold and as a result the thermal 
management interrupt is generated if TIE = 1. 





The junction temperature is greater than the threshold and as a result the thermal 
management interrupt is not generated for TIE = 1. 








Note: 





The junction temperature is less than the threshold and as a result the thermal 
management interrupt is generated if TIE = 1. 














1 TIN and TIV are read-only status bits. 


The THRM3 register, shown in Figure 2-11, is used to enable the thermal assist unit and to 
control the comparator output sample time. The thermal assist logic manages the thermal 
management interrupt generation and time-multiplexed comparisons in dual-threshold 
mode as well as other control functions. 





Reserved 


Sampled Interval Timer Value 














17 18 30 31 


Figure 2-11. Thermal Management Register 3 (THRM3) 


The bits in THRM3 are described in Table 2-17. 


Table 2-17. THRM3 Bit Settings 





Bits 


Name 


Description 





0-17 


Reserved for future use. System software should clear these bits when writing to the THRMS3. 





18-30 


SITV 


Sample interval timer value. Number of elapsed processor clock cycles before a junction 
temperature vs. threshold comparison result is sampled for TIN bit setting and interrupt generation. 
This is necessary due to the thermal sensor, DAC, and the analog comparator settling time being 
greater than the processor cycle time. The value should be con gured to allo w a sampling interval 
of 20 microseconds. 





31 








Enables the thermal sensor compare operation if either THRM1[V] or THRM2[V] is set. 





MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 





Freescale Semiconductor, Inc. 
The MPC750 Processor Register Set 


The THRM registers can be accessed with the mtspr and mfspr instructions using the 
following SPR numbers: 


¢ THRM1 is SPR 1020 
¢ THRM2 is SPR 1021 
¢ THRMS3 is SPR 1022 


2.1.5 L2 Cache Control Register (L2CR) 


The L2 cache control register, shown in Figure 2-12, is a _ supervisor-level, 
implementation-specific SPR used to configure and operate the L2 cache. It is cleared by a 
hard reset or power-on reset. 





LOWT L2DF Reserved 
L2PE L2DR L2CTL| L2TS L2SL | L2BYP L2IP 














ail L2siz| L2CLK Laram) jit) || L20H | | | [oo 000900000 9] Gp ONO) oNnOcGnro 0) coe como no | 


{2 Ba 6 7 8 9 10 11 12 13 14 15 16 17 18 19 30 31 
Figure 2-12. L2 Cache Control Register (L2CR) 


The L2 cache interface is described in Chapter 9, “L2 Cache Interface Operation.” The 
L2CR bits are described in Table 2-18. 


Table 2-18. L2CR Bit Settings 





Bits | Name Function 





0 L2E |L2 enable. Enables L2 cache operation (including snooping) starting with the next transaction the L2 
cache unit receives. Before enabling the L2 cache, the L2 clock must be con gured through 
L2CR[2CLK], and the L2 DLL must stabilize (see the hardware speci cations). All other L2CR bits 
must be set appropriately. The L2 cache may need to be invalidated globally. 





1 L2PE_ |L2 data parity generation and checking enable. Enables parity generation and checking for the L2 data 
RAM interface. When disabled, generated parity is always zeros. 








2-3 | L2SIZ |L2 size—Should be set according to the size of the L2 data RAMs used. A 256-Kbyte L2 cache 
requires a data RAM con guration of 32 Kbytes x 64 bits; a 512-Kbyte L2 cache requires a 

con gur ation of 64 Kbyte x 64 bits; a 1-Mbyte L2 cache requires a con gur ation of 128K x 64 bits. 
00 Reserved 

01 256 Kbyte 

10 512 Kbyte 

11 1 Mbyte 
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Bits 


Name 


Table 2-18. L2CR Bit Settings (continued) 


Function 





4-6 


L2CLK 


L2 clock ratio (core-to-L2 frequency divider). Speci es the cloc k divider ratio based from the core clock 
frequency that the L2 data RAM interface is to operate at. When these bits are cleared, the L2 clock is 
stopped and the on-chip DLL for the L2 interface is disabled. For nonzero values, the processor 
generates the L2 clock and the on-chip DLL is enabled. After the L2 clock ratio is chosen, the DLL must 
stabilize before the L2 interface can be enabled. (See the hardware speci cations). The resulting L2 
clock frequency cannot be slower than the clock frequency of the 60x bus interface. 

000 L2 clock and DLL disabled 

001 +1 

010 +1.5 

011 Reserved 

100 +2 

101 +2.5 

110 +3 

111 Reserved 





L2RAM 


L2 RAM type—Con gures the L2 RAM interface for the type of synchronous SRAMs used: 
¢ Flow-through (register-buffer) synchronous burst SRAMs that clock addresses in and o w data out 
¢ Pipelined (register-register) synchronous burst SRAMs that clock addresses in and clock data out 
¢ Late-write synchronous SRAMs, for which the MPC750 requires a pipelined (register-register) 
con gur ation. Late-write RAMs require write data to be valid on the cycle after WE is asserted, 
rather than on the same cycle as the write enable as with traditional burst RAMs. 
For burst RAM selections, the MPC750 does not burst data into the L2 cache; it generates an address 
for each access. Pipelined SRAMs may be used for all L2 clock modes. Note that o w-through SRAMs 
can be used only for L2 clock modes divide-by-2 or slower (divide-by-1 and divide-by-1.5 not allowed). 
00 Flow-through (register-buffer) synchronous burst SRAM 
01 Reserved 
10 Pipelined (register-register) synchronous burst SRAM 
11 Pipelined (register-register) synchronous late-write SRAM 





10 


L2DO 


L2| 


L2 data-only. Setting this bit enables data-only operation in the L2 cache. For this operation, only 
transactions from the L1 data cache can be cached in the L2 cache, which treats all transactions from 
the L1 instruction cache as cache-inhibited (bypass L2 cache, no L2 checking done). This bit is 
provided for L2 testing only. 


L2 global invalidate. Setting L2I invalidates the L2 cache globally by clearing the L2 bits including 
status bits. This bit must not be set while the L2 cache is enabled. 





11 


L2CTL 


L2 RAM control (ZZ enable). Setting L2CTL enables the automatic operation of the L2ZZ (low-power 
mode) signal for cache RAMs that support the ZZ function. While L2CTL is asserted, L2ZZ asserts 
automatically when the MPC750 enters nap or sleep mode and negates automatically when the 
MPC750 exits nap or sleep mode. This bit should not be set when the MPC750 is in nap mode and 
snooping is to be performed through deassertion of QACK. 





12 


L2WT 


L2 write-through. Setting L2WT selects write-through mode (rather than the default write-back mode) 
so all writes to the L2 cache also write through to the 60x bus. For these writes, the L2 cache entry is 
always marked as clean (valid unmodi ed) r ather than dirty (valid modi ed). This bit must never be 
asserted after the L2 cache has been enabled as previously-modi ed lines can get remar ked as clean 
during normal operation. 








13 





L2TS 





L2 test support. Setting L2TS causes cache block pushes from the L1 data cache that result from dcbf 
and debst instructions to be written only into the L2 cache and marked valid, rather than being written 
only to the 60x bus and marked invalid in the L2 cache in case of hit. This bit allows a dcbz/dcbf 
instruction sequence to be used with the L1 cache enabled to easily initialize the L2 cache with any 
address and data information. This bit also keeps dcebz instructions from being broadcast on the 60x 
and single-beat cacheable store misses in the L2 from being written to the 60x bus. 
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Table 2-18. L2CR Bit Settings (continued) 


Function 





14-15 


L20H 


L2 output hold. These bits con gure output hold time f or address, data, and control signals driven by 
the MPC750 to the L2 data RAMs. They should generally be set according to the SRAM’s input hold 
time requirements, for which late-write SRAMs usually differ from ow-through or burst SRAMs. 

00 0.5nS 

01 1.0nS 

1x Reserved 





16 


L2SL 


L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to 
increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, L2SL 
should be set if the L2 RAM interface is operated below 150 MHz. 





17 


18 


L2DF 


L2BYP 


L2 differential clock. Setting L2DF con gures the two clock-out signals (L2CLK_OUTA and 
L2CLK_OUTB) of the L2 interface to operate as one differential clock. In this mode, the B clock is 
driven as the logical complement of the A clock. This mode supports the differential clock requirements 
of late-write SRAMs. Generally, this bit should be set when late-write SRAMs are used. 


L2 DLL bypass. The DLL unit receives three input clocks: 

« A square-wave clock from the PLL unit to phase adjust and export 

« Anon-square-wave clock for the internal phase reference 

¢ A feedback clock (L2SYNC_IN) for the external phase reference. 

Asserting L2BYP causes clock #2 to be used as clocks #1 and #2. (Clock #2 is the actual clock used 
by the registers of the L2 interface circuitry.) L2BYP is intended for use when the PLL is being 
bypassed, and for engineering evaluation. If the PLL is being bypassed, the DLL must be operated in 
divide-by-1 mode, and SYSCLK must be fast enough for the DLL to support. 





19-30 


Reserved. These bits are implemented but not used; keep at 0 for future compatibility. 








31 





L2IP 





L2 global invalidate in progress (read only). This read-only bit indicates whether an L2 global invalidate 
is occurring. It should be monitored after an L2 global invalidate has been initiated by the L2I bit to 
determine when it has completed. 





The L2CR register can be accessed with the mtspr and mfspr instructions using SPR 1017. 


2.1.6 Reset Settings 


Table 2-19 shows the state of the registers and other resources after a hard reset and before 
the first instruction is fetched from address OxFFFO_0100 (the system reset exception 


























vector). 
Table 2-19. Settings Caused by Hard Reset (Used at Power-On) 
Resource Setting Resource Setting 
BATs Unde ned MSR 0x0000_0040 (only IP set) 
Caches (L1 /L2)* | Invalidated and disabled PMCn Unde ned 
CR Unde ned PVR ROM value 
CTR Unde ned Reservation address | Unde ned 
DABR Breakpoint is disabled. Address is unde ned. |/Reservation ag Cleared 
DAR 0x0000_0000 SDR1 0x0000_0000 
DEC OxFFFF_FFFF SIA 0x0000_0000 
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Table 2-19. Settings Caused by Hard Reset (Used at Power-On) (continued) 






































Resource Setting Resource Setting 
DSISR 0x0000_0000 SPRGO-SPGR3 0x0000_0000 
EAR 0x0000_0000 SRs Unde ned 
FPR Unde ned SRRO 0x0000_0000 
FPSCR 0x0000_0000 SRR1 0x0000_0000 
GPR Unde ned TBU and TBL 0x0000_0000 
HIDO 0x0000_0000 THRM1—-THRM3 0x0000_0000 
HID1 0x0000_0000 TLB Unde ned 
IABR 0x0000_0000 (Breakpoint is disabled.) UMMCRn 0x0000_0000 
ICTC 0x0000_0000 UPMCn 0x0000_0000 
L2CR 0x0000_0000 USIA 0x0000_0000 
LR 0x0000_0000 XER 0x0000_0000 
MMCRn 0x0000_0000 




















* The processor automatically begins operations by issuing an instruction fetch. Because caching is inhibited at 
start-up, this generates a single-beat load operation on the bus. 


2.2 Operand Conventions 


This section describes the operand conventions as they are represented in two levels of the 
PowerPC architecture—UISA and VEA. Detailed descriptions are provided of conventions 
used for storing values in registers and memory, accessing PowerPC registers, and 
representation of data in these registers. 


2.2.1 Floating-Point Execution Models—UISA 


The IEEE 754 standard defines conventions for 64- and 32-bit arithmetic. The standard 
requires that single-precision arithmetic be provided for single-precision operands. The 
standard permits double-precision arithmetic instructions to have either (or both) 
single-precision or double-precision operands, but states that single-precision arithmetic 
instructions should not accept double-precision operands. 


The PowerPC UISA follows these guidelines: 


¢ Double-precision arithmetic instructions may have single-precision operands but 
always produce double-precision results. 


e Single-precision arithmetic instructions require all operands to be single-precision 
and always produce single-precision results. 


For arithmetic instructions, conversion from double- to single-precision must be done 
explicitly by software, while conversion from single- to double-precision is done implicitly 
by the processor. 
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All implementations provide the equivalent of the following execution models to ensure 
that identical results are obtained. The definition of the arithmetic instructions for infinities, 
denormalized numbers, and NaNs follow conventions described in the following sections. 


Although the double-precision format specifies an 11-bit exponent, exponent arithmetic 
uses two additional bit positions to avoid potential transient overflow conditions. An extra 
bit is required when denormalized double-precision numbers are prenormalized. A second 
bit is required to permit computation of the adjusted exponent value in the following 
examples when the corresponding exception enable bit is one: 


¢ Underflow during multiplication using a denormalized operand 


¢ Overflow during division using a denormalized divisor 


2.2.2 Data Organization in Memory and Data Transfers 


Bytes in memory are numbered consecutively starting with 0. Each number is the address 
of the corresponding byte. 


Memory operands may be bytes, half words, words, or double words, or, for the load/store 
multiple and load/store string instructions, a sequence of bytes or words. The address of a 
memory operand is the address of its first byte (that is, of its lowest-numbered byte). 
Operand length is implicit for each instruction. 


2.2.3 Alignment and Misaligned Accesses 


The operand of a single-register memory access instruction has an alignment boundary 
equal to its length. An operand’s address is misaligned if is not a multiple of its width. 
Operands for single-register memory access instructions have the characteristics shown in 
Table 2-20. Although not permitted as memory operands, quad words are shown because 
quad-word alignment is desirable for certain memory operands. 


The concept of alignment is also applied more generally to data in memory. For example, 
a 12-byte data item is said to be word-aligned if its address is a multiple of four. 


Some instructions require their memory operands to have certain alignment. In addition, 
alignment may affect performance. For single-register memory access instructions, the best 
performance is obtained when memory operands are aligned. 


Instructions are 32 bits (one word) long and must be word-aligned. 


The MPC750 does not provide hardware support for floating-point memory that is not 
word-aligned. If a floating-point operand is not aligned, the MPC750 invokes an alignment 
exception, and it is left up to software to break up the offending storage access operation 
appropriately. In addition, some non-double-word—aligned memory accesses suffer 
performance degradation as compared to an aligned access of the same type. 
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In general, floating-point word accesses should always be word-aligned and floating-point 
double-word accesses should always be double-word—aligned. Frequent use of misaligned 
accesses is discouraged since they can degrade overall performance. 


2.2.4 Floating-Point Operand 


The MPC750 provides hardware support for all single- and double-precision floating-point 
operations for most value representations and all rounding modes. This architecture 
provides for hardware to implement a floating-point system as defined in ANSI/IEEE 
standard 754-1985, IEEE Standard for Binary Floating Point Arithmetic. Detailed 
information about the floating-point execution model can be found in Chapter 3, “Operand 
Conventions,” in the Programming Environments Manual. 


The MPC750 supports non-IEEE mode whenever FPSCR[29] is set. In this mode, 
denormalized numbers, NaNs, and some IEEE invalid operations are treated in a non-IEEE 
conforming manner. This is accomplished by delivering results that approximate the values 
required by the IEEE standard. Table 2-20 summarizes the conditions and mode behavior 


for operands. 


Table 2-20. Floating-Point Operand Data Type Behavior 





























Operand A Operand B Operand C IEEE Mode Non-IEEE Mode 
Data Type Data Type Data Type (NI = 0) (NI = 1) 
Single denormalized Single denormalized Single denormalized Normalize all three | Zero all three 
Double denormalized |Double denormalized | Double denormalized 
Single denormalized Single denormalized Normalized or zero Normalize AandB_ |ZeroAandB 
Double denormalized | Double denormalized 
Normalized or zero Single denormalized Single denormalized Normalize Band C_ | Zero B and C 
Double denormalized | Double denormalized 
Single denormalized Normalized or zero Single denormalized Normalize AandC | Zero AandC 
Double denormalized Double denormalized 
Single denormalized Normalized or zero Normalized or zero Normalize A Zero A 
Double denormalized 
Normalized or zero Single denormalized Normalized or zero Normalize B Zero B 
Double denormalized 
Normalized or zero Normalized or zero Single denormalized Normalize C Zero C 
Double denormalized 
Single QNaN Don’t care Don’t care QNaN' QNaN' 
Single SNaN 
Double QNaN 
Double SNaN 
Don’t care Single QNaN Don’t care QNaN' QNaN' 
Single SNaN 
Double QNaN 
Double SNaN 
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Table 2-20. Floating-Point Operand Data Type Behavior (continued) 





Operand A Operand B Operand C IEEE Mode Non-IEEE Mode 
Data Type Data Type Data Type (NI = 0) (NI = 1) 
Don’t care Don’t care Single QNaN QNaN' QNaN' 
Single SNaN 
Double QNaN 
Double SNaN 
Single normalized Single normalized Single normalized Do the operation Do the operation 
Single in nity Single in nity Single in nity 
Single zero Single zero Single zero 
Double normalized Double normalized Double normalized 
Double in nity Double in nity Double in nity 
Double zero Double zero Double zero 























1 Prioritize according to Chapter 3, “Operand Conventions,’ in the Programming Environments Manual. 


Table 2-21 summarizes the mode behavior for results. 


Table 2-21. Floating-Point Result Data Type Behavior 





























Precision Data Type IEEE Mode (NI = 0) Non-IEEE Mode (NI = 1) 
Single Denormalized | Return single-precision denormalized number _ | Return zero. 
with trailing zeros. 
Single Normalized, Return the result. Return the result. 
in nity, zero 
Single QNaN, SNaN_| Return QNaN. Return QNaN. 
Single INT Place integer into low word of FPR. If (Invalid Operation) 
then 
Place (0x8000) into FPR[32-63] 
else 
Place integer into FPR[32-63]. 
Double Denormalized | Return double-precision denormalized number. | Return zero. 
Double Normalized, Return the result. Return the result. 
in nity, zero 
Double QNaN, SNaN_| Return QNaN. Return QNaN. 
Double INT Not supported by MPC750 Not supported by MPC750 




















2.3 


This chapter describes instructions and addressing modes defined for the MPC750. These 
instructions are divided into the following functional categories: 


Instruction Set Summary 


¢ Integer instructions—These include arithmetic and logical instructions. For more 
information, see Section 2.3.4.1, “Integer Instructions.” 

¢ Floating-point instructions—These include floating-point arithmetic instructions, as 
well as instructions that affect the floating-point status and control register (FPSCR). 
For more information, see Section 2.3.4.2, “Floating-Point Instructions.” 
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¢ Load and store instructions—These include integer and floating-point load and store 
instructions. For more information, see Section 2.3.4.3, “Load and Store 
Instructions.” 


¢ Flow control instructions—These include branching instructions, condition register 
logical instructions, trap instructions, and other instructions that affect the 
instruction flow. For more information, see Section 2.3.4.4, “Branch and Flow 
Control Instructions.” 


¢ Processor control instructions—These instructions are used for synchronizing 
memory accesses and managing caches, TLBs, and segment registers. For more 
information, see Section 2.3.4.6, “Processor Control Instructions—UISA,” 
Section 2.3.5.1, “Processor Control Instructions—VEA,” and Section 2.3.6.2, 
“Processor Control Instructions—OEA.” 


¢ Memory synchronization instructions—These instructions are used for memory 
synchronizing. See Section 2.3.4.7, “Memory Synchronization 
Instructions—UISA,” Section 2.3.5.2, “Memory Synchronization 
Instructions—VEA,” for more information. 


¢ Memory control instructions—These instructions provide control of caches, TLBs, 
and segment registers. For more information, see Section 2.3.5.3, “Memory Control 
Instructions—VEA,” and Section 2.3.6.3, “Memory Control Instructions—OEA.” 


e External control instructions—These include instructions for use with special 
input/output devices. For more information, see Section 2.3.5.4, “Optional External 
Control Instructions.” 


Note that this grouping of instructions does not necessarily indicate the execution unit that 
processes a particular instruction or group of instructions. This information, which is useful 
for scheduling instructions most effectively, is provided in Chapter 6, “Instruction Timing.” 


Integer instructions operate on word operands. Floating-point instructions operate on 
single-precision and double-precision floating-point operands. The PowerPC architecture 
uses instructions that are four bytes long and word-aligned. It provides for byte, half-word, 
and word operand loads and stores between memory and a set of 32 general-purpose 
registers (GPRs). It also provides for word and double-word operand loads and stores 
between memory and a set of 32 floating-point registers (FPRs). 


Arithmetic and logical instructions do not read or modify memory. To use the contents of a 
memory location in a computation and then modify the same or another memory location, 
the memory contents must be loaded into a register, modified, and then written to the target 
location using load and store instructions. 


The description of each instruction includes the mnemonic and a formatted list of operands. 
To simplify assembly language programming, a set of simplified mnemonics and symbols 
is provided for some of the frequently-used instructions; see Appendix F, “Simplified 
Mnemonics,” in the Programming Environments Manual for a complete list of simplified 
mnemonics. Note that the architecture specification refers to simplified mnemonics as 
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extended mnemonics. Programs written to be portable across the various assemblers for the 
PowerPC architecture should not assume the existence of mnemonics not described in that 
document. 


2.3.1. Classes of Instructions 


The MPC750 instructions belong to one of the following three classes: 
¢ Defined 
° Tilegal 
¢ Reserved 


Note that while the definitions of these terms are consistent among the processors of this 
family, the assignment of these classifications is not. For example, PowerPC instructions 
defined for 64-bit implementations are treated as illegal by 32-bit implementations such as 
the MPC750. 


The class is determined by examining the primary opcode and the extended opcode, if any. 
If the opcode, or combination of opcode and extended opcode, is not that of a defined 
instruction or of a reserved instruction, the instruction is illegal. 


Instruction encodings that are now illegal may become assigned to instructions in the 
architecture or may be reserved by being assigned to processor-specific instructions. 


2.3.1.1. Definition of Boundedly Undefined 


If instructions are encoded with incorrectly set bits in reserved fields, the results on 
execution can be said to be boundedly undefined. If a user-level program executes the 
incorrectly coded instruction, the resulting undefined results are bounded in that a spurious 
change from user to supervisor state is not allowed, and the level of privilege exercised by 
the program in relation to memory access and other system resources cannot be exceeded. 
Boundedly-undefined results for a given instruction may vary between implementations, 
and between execution attempts in the same implementation. 


2.3.1.2 Defined Instruction Class 


Defined instructions are guaranteed to be supported in all implementations, except as stated 
in the instruction descriptions in Chapter 8, “Instruction Set,’ in the Programming 
Environments Manual. The MPC750 provides hardware support for all instructions defined 
for 32-bit implementations. It does not support the optional fsqrt, fsqrts, and tlbia 
instructions. 


A processor invokes the illegal instruction error handler (part of the program exception) 
when the unimplemented PowerPC instructions are encountered so they may be emulated 
in software, as required. Note that the architecture specification refers to exceptions as 
interrupts. 
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A defined instruction can have invalid forms. The MPC750 provides limited support for 
instructions represented in an invalid form. 


2.3.1.3 Illegal Instruction Class 


Illegal instructions can be grouped into the following categories: 


¢ Instructions not defined in the PowerPC architecture.The following primary opcodes 
are defined as illegal but may be used in future extensions to the architecture: 


1, 4, 5, 6, 9, 22, 56, 57, 60, 61 


Future versions of the PowerPC architecture may define any of these instructions to 
perform new functions. 


e Instructions defined in the PowerPC architecture but not implemented in a specific 
implementation. For example, instructions that can be executed on 64-bit processors 
are considered illegal by 32-bit processors such as the MPC750. 


The following primary opcodes are defined for 64-bit implementations only and are 
illegal on the MPC750: 


2, 30, 58, 62 


e All unused extended opcodes are illegal. The unused extended opcodes can be 
determined from information in Section A.2, “Instructions Sorted by Opcode,” and 
Section 2.3.1.4, “Reserved Instruction Class.” Notice that extended opcodes for 
instructions defined only for 64-bit implementations are illegal in 32-bit 
implementations, and vice versa. The following primary opcodes have unused 
extended opcodes. 


17, 19, 31, 59, 63 (Primary opcodes 30 and 62 are illegal for all 32-bit 
implementations, but as 64-bit opcodes they have some unused extended opcodes.) 


e Aninstruction consisting of only zeros is guaranteed to be an illegal instruction. This 
increases the probability that an attempt to execute data or uninitialized memory 
invokes the system illegal instruction error handler (a program exception). Note that 
if only the primary opcode consists of all zeros, the instruction is considered a 
reserved instruction, as described in Section 2.3.1.4, “Reserved Instruction Class.” 


The MPC750 invokes the system illegal instruction error handler (a program exception) 
when it detects any instruction from this class or any instructions defined only for 64-bit 
implementations. 


See Section 4.5.7, “Program Exception (0x00700),” for additional information about illegal 
and invalid instruction exceptions. Except for an instruction consisting of binary zeros, 
illegal instructions are available for additions to the PowerPC architecture. 
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2.3.1.4 Reserved Instruction Class 


Reserved instructions are allocated to specific implementation-dependent purposes not 
defined by the PowerPC architecture. Attempting to execute an unimplemented reserved 
instruction invokes the illegal instruction error handler (a program exception). See 
“Program Exception (0x00700),” in Chapter 6, “Exceptions,” in the Programming 
Environments Manual for information about illegal and invalid instruction exceptions. 


The PowerPC architecture defines four types of reserved instructions: 


e Instructions in the POWER architecture not part of the PowerPC UISA. For details 
on POWER architecture incompatibilities and how they are handled by processors 
in this family, see Appendix B, “POWER Architecture Cross Reference,” in the 
Programming Environments Manual. 


¢ Implementation-specific instructions required for the processor to conform to the 
PowerPC architecture (none of these are implemented in the MPC750) 


e All other implementation-specific instructions 
e Architecturally-allowed extended opcodes 


2.3.2 Addressing Modes 


This section provides an overview of conventions for addressing memory and for 
calculating effective addresses as defined by the PowerPC architecture for 32-bit 
implementations. For more detailed information, see “Conventions,” in Chapter 4, 
“Addressing Modes and Instruction Set Summary,” of the Programming Environments 
Manual. 


2.3.2.1 Memory Addressing 


A program references memory using the effective (logical) address computed by the 
processor when it executes a memory access or branch instruction or when it fetches the 
next sequential instruction. 


Bytes in memory are numbered consecutively starting with zero. Each number is the 
address of the corresponding byte. 


2.3.2.2 Memory Operands 


Memory operands may be bytes, half words, words, or double words, or, for the load/store 
multiple and load/store string instructions, a sequence of bytes or words. The address of a 
memory operand is the address of its first byte (that is, of its lowest-numbered byte). 
Operand length is implicit for each instruction. The PowerPC architecture supports both 
big-endian and little-endian byte ordering. The default byte and bit ordering is big-endian. 
See “Byte Ordering,’ in Chapter 3, “Operand Conventions,’ of the Programming 
Environments Manual for more information about big- and little-endian byte ordering. 
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The operand of a single-register memory access instruction has a natural alignment 
boundary equal to the operand length. In other words, the “natural” address of an operand 
is an integral multiple of the operand length. A memory operand is said to be aligned if it 
is aligned at its natural boundary; otherwise it is misaligned. For a detailed discussion about 
memory operands, see Chapter3, “Operand Conventions,’ of the Programming 
Environments Manual. 


2.3.2.0 Effective Address Calculation 


An effective address is the 32-bit sum computed by the processor when executing a 
memory access or branch instruction or when fetching the next sequential instruction. For 
a memory access instruction, if the sum of the effective address and the operand length 
exceeds the maximum effective address, the memory operand is considered to wrap around 
from the maximum effective address through effective address 0, as described in the 
following paragraphs. 


Effective address computations for both data and instruction accesses use 32-bit unsigned 
binary arithmetic. A carry from bit 0 is ignored. 
Load and store operations have the following modes of effective address generation: 
¢ EA =(rAl0) + offset (including offset = 0) (register indirect with immediate index) 
e EA =(rAl0) + rB (register indirect with index) 


Refer to Section 2.3.4.3.2, “Integer Load and Store Address Generation,’ for a detailed 
description of effective address generation for load and store operations. 


Branch instructions have three categories of effective address generation: 
¢ Immediate 
e Link register indirect 


e Count register indirect 


2.3.2.4 Synchronization 


The synchronization described in this section refers to the state of the processor that is 
performing the synchronization. 


2.3.2.4.1. Context Synchronization 


The System Call (sc) and Return from Interrupt (rfi) instructions perform context 
synchronization by allowing previously issued instructions to complete before performing 
a change in context. Execution of one of these instructions ensures the following: 
¢ No higher priority exception exists (sc). 
e All previous instructions have completed to a point where they can no longer cause 
an exception. If a prior memory access instruction causes direct-store error 
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exceptions, the results are guaranteed to be determined before this instruction is 
executed. 


e Previous instructions complete execution in the context (privilege, protection, and 
address translation) under which they were issued. 


e The instructions following the sc or rfi instruction execute in the context established 
by these instructions. 


2.3.2.4.2 Execution Synchronization 


An instruction is execution synchronizing if all previously initiated instructions appear to 
have completed before the instruction is initiated or, in the case of syne and isync, before 
the instruction completes. For example, the Move to Machine State Register (mtmsr) 
instruction is execution synchronizing. It ensures that all preceding instructions have 
completed execution and cannot cause an exception before the instruction executes, but 
does not ensure subsequent instructions execute in the newly established environment. For 
example, if the mtmsr sets the MSR[PR] bit, unless an isyne immediately follows the 
mtmsr instruction, a privileged instruction could be executed or privileged access could be 
performed without causing an exception even though the MSR[PR] bit indicates user mode. 


2.3.2.4.3 Instruction-Related Exceptions 


There are two kinds of exceptions in the MPC750—those caused directly by the execution 
of an instruction and those caused by an asynchronous event (or interrupts). Either may 
cause components of the system software to be invoked. 


Exceptions can be caused directly by the execution of an instruction as follows: 


e An attempt to execute an illegal instruction causes the illegal instruction (program 
exception) handler to be invoked. An attempt by a user-level program to execute the 
supervisor-level instructions listed below causes the privileged instruction (program 
exception) handler to be invoked. The MPC750 provides the following 
supervisor-level instructions: dcbi, mfmsr, mfspr, mfsr, mfsrin, mtmsr, mtspr, 
mtsr, mtsrin, rfi, tlbie, and thbsync. Note that the privilege level of the mfspr and 
mtspr instructions depends on the SPR encoding. 


e Any mtspr, mfspr, or mftb instruction with an invalid SPR (or TBR) field causes 
an illegal type program exception. Likewise, a program exception 1s taken if 
user-level software tries to access a supervisor-level SPR. An mtspr instruction 
executing in supervisor mode (MSR[PR] = 0) with the SPR field specifying HID1 
or PVR (read-only registers) executes as a no-op. 

e Anattempt to access memory that is not available (page fault) causes the ISI or DSI 
exception handler to be invoked. 

e The execution of an se instruction invokes the system call exception handler that 
permits a program to request the system to perform a service. 


e The execution of a trap instruction invokes the program exception trap handler. 
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e The execution of an instruction that causes a floating-point exception while 
exceptions are enabled in the MSR invokes the program exception handler. 


A detailed description of exception conditions is provided in Chapter 4, “Exceptions.” 


2.3.9 Instruction Set Overview 


This section provides a brief overview of the PowerPC instructions implemented in the 
MPC750 and highlights any special information with respect to how the MPC750 
implements a particular instruction. Note that the categories used in this section correspond 
to those used in Chapter 4, “Addressing Modes and Instruction Set Summary,’ in the 
Programming Environments Manual. These categorizations are somewhat arbitrary and are 
provided for the convenience of the programmer and do not necessarily reflect the PowerPC 
architecture specification. 

Note that some instructions have the following optional features: 

¢ CR Update—The dot (.) suffix on the mnemonic enables the update of the CR. 


¢ Overflow option—The o suffix indicates that the overflow bit in the XER is enabled. 


2.3.4 PowerPC UISA Instructions 


The PowerPC UISA includes the base user-level instruction set (excluding a few user-level 
cache control, synchronization, and time base instructions), user-level registers, 
programming model, data types, and addressing modes. This section discusses the 
instructions defined in the UISA. 


2.3.4.1. Integer Instructions 
This section describes the integer instructions. These consist of the following: 
e Integer arithmetic instructions 
¢ Integer compare instructions 
¢ Integer logical instructions 
¢ Integer rotate and shift instructions 


Integer instructions use the content of the GPRs as source operands and place results into 
GPRs, into the integer exception register (XER), and into condition register (CR) fields. 


2.3.4.1.1. Integer Arithmetic Instructions 


Table 2-22 lists the integer arithmetic instructions for the processors in this family. 
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Table 2-22. Integer Arithmetic Instructions 





Name Mnemonic Syntax 
Add Immediate addi rD,rA,SIMM 
Add Immediate Shifted addis rD,rA,SIMM 
Add add (add. addo addo.) rD,rA,rB 
Subtract From subf (subf. subfo subfo.) rD,rA,rB 
Add Immediate Carrying addic rD,rA,SIMM 
Add Immediate Carrying and Record addic. rD,rA,SIMM 
Subtract from Immediate Carrying subfic rD,rA,SIMM 
Add Carrying addc (addc. addco addco.) rD,rA,rB 
Subtract from Carrying subfc (subfc. subfco subfco.) rD,rA,rB 
Add Extended adde (adde. addeo addeo.) rD,rA,rB 
Subtract from Extended subfe (subfe. subfeo subfeo.) rD,rA,rB 
Add to Minus One Extended addme (addme. addmeo addmeo.) rD,rA 
Subtract from Minus One Extended subfme (subfme. subfmeo subfmeo.) | rD,rA 
Add to Zero Extended addze (addze. addzeo addzeo.) rD,rA 
Subtract from Zero Extended subfze (subfze. subfzeo subfzeo.) rD,rA 
Negate neg (neg. nego nego.) rD,rA 
Multiply Low Immediate mulli rD,rA,SIMM 
Multiply Low mullw (mullw. mullwo mullwo.) rD,rA,rB 
Multiply High Word mulhw (mulhw.) rD,rA,rB 
Multiply High Word Unsigned mulhwu (mulhwu.) rD,rA,rB 
Divide Word divw (divw. divwo divwo.) rD,rA,rB 
Divide Word Unsigned divwu divwu. divwuo divwuo. rD,rA,rB 














Although there is no Subtract Immediate instruction, its effect can be achieved by using an 
addi instruction with the immediate operand negated. Simplified mnemonics are provided 
that include this negation. The subf instructions subtract the second operand (rA) from the 
third operand (rB). Simplified mnemonics are provided in which the third operand is 
subtracted from the second operand. See Appendix F, “Simplified Mnemonics,” in the 
Programming Environments Manual for examples. 


The UISA states that an implementation that executes instructions that set the overflow 
enable bit (OE) or the carry bit (CA) may either execute these instructions slowly or prevent 
execution of the subsequent instruction until the operation completes. Chapter 6, 
“Instruction Timing,” describes how the MPC750 handles CR dependencies. The summary 
overflow bit (SO) and overflow bit (OV) in the integer exception register are set to reflect 
an overflow condition of a 32-bit result. This can happen only when OE = 1. 
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2.3.4.1.2 Integer Compare Instructions 


The integer compare instructions algebraically or logically compare the contents of register 
rA with either the zero-extended value of the UIMM operand, the sign-extended value of 
the SIMM operand, or the contents of register rB. The comparison is signed for the cmpi 
and cmp instructions, and unsigned for the cmpli and cmpl instructions. Table 2-23 
summarizes the integer compare instructions. 


Table 2-23. Integer Compare Instructions 

















Name Mnemonic Syntax 
Compare Immediate cmpi crfD,L,rA,SIMM 
Compare cmp crfD,L,rA,rB 
Compare Logical Immediate cmpli crfD,L,rA,UIMM 
Compare Logical cmpl crfD,L,rA,rB 


The erfD operand can be omitted if the result of the comparison is to be placed in CRO. 
Otherwise the target CR field must be specified in erfD, using an explicit field number. 


For information on simplified mnemonics for the integer compare instructions see 
Appendix F, “Simplified Mnemonics,” in the Programming Environments Manual. 


2.3.4.1.3 Integer Logical Instructions 


The logical instructions shown in Table 2-24 perform bit-parallel operations on the 
specified operands. Logical instructions with the CR updating enabled (uses dot suffix) and 
instructions andi. and andis. set CR field CRO to characterize the result of the logical 
operation. Logical instructions do not affect XER[SO], XER[OV], or XER[CA]. 


See Appendix F, “Simplified Mnemonics,” in the Programming Environments Manual for 
simplified mnemonic examples for integer logical operations. 


Table 2-24. Integer Logical Instructions 





Name Mnemonic Syntax Implementation Notes 
AND Immediate andi. rA,rS,UIMM | — 
AND Immediate Shifted andis. rA,rS,UIMM | — 
OR Immediate ori rA,rS,UIMM | The PowerPC architecture de nes ori r0,r0,0 as the 


preferred form for the no-op instruction. The dispatcher 
discards this instruction (except for pending trace or 
breakpoint exceptions). 





























OR Immediate Shifted oris rA,rS,UIMM | — 
XOR Immediate xori rA,rS,UIMM | — 
XOR Immediate Shifted xoris rA,rS,UIMM | — 
AND and (and.) rA,tS,rB — 
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Table 2-24. Integer Logical Instructions (continued) 















































Name Mnemonic Syntax Implementation Notes 
OR or (or.) rA,rS,rB —_— 
XOR xor (xor.) rA,tS,rB — 
NAND nand (nand.) /rA,rs,rB — 
NOR nor (nor.) rA,tS,rB — 
Equivalent eqv (eqv.) rA,tS,rB _— 
AND with Complement ande (andc.) |rA,rS,rB _— 
OR with Complement ore (orc.) rA,rS,rB _— 
Extend Sign Byte extsb (extsb.) |rA,rS — 
Extend Sign Half Word extsh (extsh.) | rA,rS — 
Count Leading Zeros Word | entlzw = (entlzw.) | rA,rS —_— 





2.3.4.1.4 Integer Rotate and Shift Instructions 


Rotation operations are performed on data from a GPR, and the result, or a portion of the 
result, is returned to a GPR. See Appendix F, “Simplified Mnemonics,” in the 
Programming Environments Manual for a complete list of simplified mnemonics that 
allows simpler coding of often-used functions such as clearing the leftmost or rightmost 
bits of a register, left justifying or right justifying an arbitrary field, and simple rotates and 
shifts. 


Integer rotate instructions rotate the contents of a register. The result of the rotation is either 
inserted into the target register under control of a mask (if a mask bit is 1 the associated bit 
of the rotated data is placed into the target register, and if the mask bit is 0 the associated 
bit in the target register is unchanged), or ANDed with a mask before being placed into the 
target register. 


The integer rotate instructions are summarized in Table 2-25. 


Table 2-25. Integer Rotate Instructions 














Name Mnemonic Syntax 
Rotate Left Word Immediate then AND with Mask rlwinm (rlwinm.) rA,tS,SH,MB,ME 
Rotate Left Word then AND with Mask rlwnm (rlwnm.) rA,rS,rB,MB,ME 
Rotate Left Word Immediate then Mask Insert rlwimi_ (rlwimi.) rA,tS,SH,MB,ME 

















The integer shift instructions perform left and right shifts. Immediate-form logical 
(unsigned) shift operations are obtained by specifying masks and shift values for certain 
rotate instructions. Simplified mnemonics (shown in Appendix F, “Simplified 
Mnemonics,” in the Programming Environments Manual) are provided to make coding of 
such shifts simpler and easier to understand. 
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Multiple-precision shifts can be programmed as shown in Appendix C, “Multiple-Precision 
Shifts,’ in the Programming Environments Manual. The integer shift instructions are 
summarized in Table 2-26. 


Table 2-26. Integer Shift Instructions 

















Name Mnemonic Syntax 
Shift Left Word slw (slw.) rA,rS,rB 
Shift Right Word srw (srw.) rA,tS,rB 
Shift Right Algebraic Word Immediate srawi (srawi.) rA,tS,SH 
Shift Right Algebraic Word sraw (sraw.) rA,tS,rB 

















2.3.4.2 Floating-Point Instructions 


This section describes the floating-point instructions, which include the following: 
¢ Floating-point arithmetic instructions 
¢ Floating-point multiply-add instructions 
¢ Floating-point rounding and conversion instructions 
¢ Floating-point compare instructions 
¢ Floating-point status and control register instructions 
¢ Floating-point move instructions 


See Section 2.3.4.3, “Load and Store Instructions,” for information about floating-point 
loads and stores. 


The PowerPC architecture supports a floating-point system as defined in the IEEE 754 
standard, but requires software support to conform with that standard. All floating-point 
operations conform to the IEEE 754 standard, except if software sets the non-IEEE mode 
FPSCR[N]J]. 


2.3.4.2.1_ Floating-Point Arithmetic Instructions 
The floating-point arithmetic instructions are summarized in Table 2-27. 


Table 2-27. Floating-Point Arithmetic Instructions 
































Name Mnemonic Syntax 
Floating Add (Double-Precision) fadd (fadd.) frD,frA,frB 
Floating Add Single fadds (fadds.) | frD,frA,frB 
Floating Subtract (Double-Precision) fsub (fsub.) frD,frA,frB 
Floating Subtract Single fsubs (fsubs.) frD,frA,frB 
Floating Multiply (Double-Precision) fmul (fmul.) frD,frA,fre 
Floating Multiply Single fmuls (fmuls.) frD,frA,frc 
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Table 2-27. Floating-Point Arithmetic Instructions (continued) 














Name Mnemonic Syntax 
Floating Divide (Double-Precision) fdiv (fdiv.) frD,frA,frB 
Floating Divide Single fdivs (fdivs.) frD,frA,frB 
Floating Reciprocal Estimate Single ' fres (fres.) frD,frB 





Floating Reciprocal Square Root Estimate 1 frsqrte (frsqrte.) | frD,frB 














Floating Select 1 fsel frD,frA,frC,frB 





1The fsel instruction is optional in the PowerPC architecture. 


All single-precision arithmetic instructions are performed using a double-precision format. 
The floating-point architecture is a single-pass implementation for double-precision 
products. In most cases, a single-precision instruction using only single-precision 
operands, in double-precision format, has the same latency as its double-precision 
equivalent. 


2.3.4.2.2 Floating-Point Multiply-Add Instructions 


These instructions combine multiply and add operations without an intermediate rounding 
operation. The floating-point multiply-add instructions are summarized in Table 2-28. 


Table 2-28. Floating-Point Multiply-Add Instructions 


























Name Mnemonic Syntax 
Floating Multiply-Add (Double-Precision) fmadd (fmadd.) frD,frA,frC,frB 
Floating Multiply-Add Single fmadds (fmadds.) | frD,frA,frC,frB 
Floating Multiply-Subtract (Double-Precision) fmsub (fmsub.) frD,frA,frC,frB 
Floating Multiply-Subtract Single fmsubs (fmsubs.) frD,frA,frC,frB 
Floating Negative Multiply-Add (Double-Precision) fnmadd (fnmadd.) frD,frA,frC,frB 
Floating Negative Multiply-Add Single fnmadds (fnmadds.) | frD,frA,frC,frB 
Floating Negative Multiply-Subtract (Double-Precision) fnmsub (fnmsub.) frD,frA,frC,frB 
Floating Negative Multiply-Subtract Single fnmsubs (fnmsubs.) | frD,frA,frC,frB 

















2.3.4.2.3 Floating-Point Rounding and Conversion Instructions 


The Floating Round to Single-Precision (frsp) instruction is used to truncate a 64-bit 
double-precision number to a 32-bit single-precision floating-point number. The 
floating-point convert instructions convert a 64-bit double-precision floating-point number 
to a 32-bit signed integer number. 


Examples of uses of these instructions to perform various conversions can be found in 
Appendix D, “Floating-Point Models,” in the Programming Environments Manual. 
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Table 2-29. Floating-Point Rounding and Conversion Instructions 





Name Mnemonic Syntax 
Floating Round to Single frsp (frsp.) frD,frB 
Floating Convert to Integer Word fctiw (fctiw.) frD,frB 
Floating Convert to Integer Word with Round toward Zero fetiwz (fctiwz.) frD,frB 














2.3.4.2.4 Floating-Point Compare Instructions 


Floating-point compare instructions compare the contents of two floating-point registers. 
The comparison ignores the sign of zero (that is +0 = —O). The floating-point compare 
instructions are summarized in Table 2-30. 


Table 2-30. Floating-Point Compare Instructions 











Name Mnemonic Syntax 
Floating Compare Unordered fcmpu crfD,frA,frB 
Floating Compare Ordered fcmpo crfD,frA,frB 

















The PowerPC architecture allows an fempu or fempo instruction with the Rc bit set to 
produce a boundedly-undefined result, which may include an illegal instruction program 
exception. In the MPC750, crfD should be treated as undefined 


2.3.4.2.5 Floating-Point Status and Control Register Instructions 


Every FPSCR instruction appears to synchronize the effects of all floating-point 
instructions executed by a given processor. Executing an FPSCR instruction ensures that all 
floating-point instructions previously initiated by the given processor appear to have 
completed before the FPSCR instruction is initiated and that no subsequent floating-point 
instructions appear to be initiated by the given processor until the FPSCR instruction has 
completed. The FPSCR instructions are summarized in Table 2-31. 


Table 2-31. Floating-Point Status and Control Register Instructions 























Name Mnemonic Syntax 
Move from FPSCR mffs (mffs.) frD 
Move to Condition Register from FPSCR mcerfs crfD,crfS 
Move to FPSCR Field Immediate mtfsfi (mtfsfi.) | crfD,IMM 
Move to FPSCR Fields mtfsf (mtfsf.) | FM,frB 
Move to FPSCR Bit 0 mtfsb0 (mtfsb0.) | crbD 
Move to FPSCR Bit 1 mtfsb1 (mtfsb1.) | crbD 














Implementation Note—The PowerPC architecture states that in some implementations, 
the Move to FPSCR Fields (mtfsf) instruction may perform more slowly when only some 
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of the fields are updated as opposed to all of the fields. In the MPC750, there is no 
degradation of performance. 


2.3.4.2.6 Floating-Point Move Instructions 


Floating-point move instructions copy data from one FPR to another. The floating-point 
move instructions do not modify the FPSCR. The CR update option in these instructions 
controls the placing of result status into CR1. Table 2-32 summarizes the floating-point 
move instructions. 


Table 2-32. Floating-Point Move Instructions 

















Name Mnemonic Syntax 
Floating Move Register fmr (fmr.) frD,frB 
Floating Negate fneg (fneg.) frD,frB 
Floating Absolute Value fabs (fabs.) frD,frB 
Floating Negative Absolute Value fnabs (fnabs.) frD,frB 

















2.3.4.3 Load and Store Instructions 


Load and store instructions are issued and translated in program order; however, the 
accesses can occur out of order. Synchronizing instructions are provided to enforce strict 
ordering. This section describes the load and store instructions, which consist of the 
following: 


¢ Integer load instructions 

¢ Integer store instructions 

¢ Integer load and store with byte-reverse instructions 
e Integer load and store multiple instructions 

¢ Floating-point load instructions 

¢ Floating-point store instructions 

¢ Memory synchronization instructions 


Implementation Notes—The following describes how the MPC750_ handles 
misalignment: 


The MPC750 provides hardware support for misaligned memory accesses. It performs 
those accesses within a single cycle if the operand lies within a double-word boundary. 
Misaligned memory accesses that cross a double-word boundary degrade performance. 


For string operations, the hardware makes no attempt to combine register values to reduce 
the number of discrete accesses. Combining stores enhances performance if store gathering 
is enabled and the accesses meet the criteria described in Section 6.4.7, “Integer Store 
Gathering.” Note that the PowerPC architecture requires load/store multiple instruction 
accesses to be aligned. At a minimum, additional cache access cycles are required. 
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Although many unaligned memory accesses are supported in hardware, the frequent use of 
them is discouraged since they can compromise the overall performance of the processor. 


Accesses that cross a translation boundary may be restarted. That is, a misaligned access 
that crosses a page boundary is completely restarted if the second portion of the access 
causes a page fault. This may cause the first access to be repeated. 


On some processors, such as the MPC603, a TLB reload would cause an instruction restart. 
On the MPC750, TLB reloads are done transparently and only a page fault causes a restart. 


2.3.4.3.1  Self-Modifying Code 


When a processor modifies a memory location that may be contained in the instruction 
cache, software must ensure that memory updates are visible to the instruction fetching 
mechanism. This can be achieved by the following instruction sequence: 


dcbst |update memory 

sync |wait for update 

icbi |remove (invalidate) copy in instruction cache 
isyne |remove copy in own instruction buffer 


These operations are required because the data cache is a write-back cache. Since 
instruction fetching bypasses the data cache, changes to items in the data cache may not be 
reflected in memory until the fetch operations complete. 


Special care must be taken to avoid coherency paradoxes in systems that implement unified 
secondary caches, and designers should carefully follow the guidelines for maintaining 
cache coherency that are provided in the VEA, and discussed in Chapter 5, “Cache Model 
and Memory Coherency,” in the Programming Environments Manual. Because the 
MPC750 does not broadcast the M bit for instruction fetches, external caches are subject to 
coherency paradoxes. 


2.3.4.3.2 Integer Load and Store Address Generation 


Integer load and store operations generate effective addresses using register indirect with 
immediate index mode, register indirect with index mode, or register indirect mode. See 
Section 2.3.2.3, “Effective Address Calculation,’ for information about calculating 
effective addresses. Note that in some implementations, operations that are not naturally 
aligned may suffer performance degradation. Refer to Section 4.5.6, “Alignment Exception 
(Ox00600),” for additional information about load and store address alignment exceptions. 


2.3.4.3.3 Register Indirect Integer Load Instructions 


For integer load instructions, the byte, half word, word, or double word addressed by the 
EA (effective address) is loaded into rD. Many integer load instructions have an update 
form, in which rA is updated with the generated effective address. For these forms, if 
rA #0 and rA #rD (otherwise invalid), the EA is placed into rA and the memory element 
(byte, half word, word, or double word) addressed by the EA is loaded into rD. Note that 
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the PowerPC architecture defines load with update instructions with operand rA = 0 or 
rA =rbD as invalid forms. 


Implementation Notes—The following notes describe the MPC750 implementation of 
integer load instructions: 


The PowerPC architecture cautions programmers that some implementations of the 
architecture may execute the load half algebraic (Iha, Ihax) instructions with greater 
latency than other types of load instructions. This is not the case for the MPC750; 
these instructions operate with the same latency as other load instructions. 


The PowerPC architecture cautions programmers that some implementations of the 
architecture may run the load/store byte-reverse (Ihbrx, Ibrx, sthbrx, stwbrx) 
instructions with greater latency than other types of load/store instructions. This is 
not the case for the MPC750. These instructions operate with the same latency as the 
other load/store instructions. 


The PowerPC architecture describes some preferred instruction forms for load and 
store multiple instructions and integer move assist instructions that may perform 
better than other forms in some implementations. None of these preferred forms 
affect instruction performance on the MPC750. 


The PowerPC architecture defines the lwarx and stwex. as a way to update memory 
atomically. In the MPC750, reservations are made on behalf of aligned 32-byte 
sections of the memory address space. Executing lwarx and stwex. to a page marked 
write-through does not cause a DSI exception if the W bit is set, but as with other 
memory accesses, DSI exceptions can result for other reasons such as a protection 
violations or page faults. 


In general, because stwex. always causes an external bus transaction it has slightly 
worse performance characteristics than normal store operations. 


Table 2-33 summarizes the integer load instructions. 


Table 2-33. Integer Load Instructions 





Name Mnemonic Syntax 
Load Byte and Zero Ibz rD,d(rA) 
Load Byte and Zero Indexed Ibzx rD,rA,rB 
Load Byte and Zero with Update Ibzu rD,d(rA) 
Load Byte and Zero with Update Indexed Ibzux rD,rA,rB 
Load Half Word and Zero Ihz rD,d(rA) 
Load Half Word and Zero Indexed Ihzx rD,rA,rB 
Load Half Word and Zero with Update Ihzu rD,d(rA) 
Load Half Word and Zero with Update Indexed Ihzux rD,rA,rB 
Load Half Word Algebraic lha rD,d(rA) 
Load Half Word Algebraic Indexed Ihax rD,rA,rB 
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Table 2-33. Integer Load Instructions (continued) 























Name Mnemonic Syntax 
Load Half Word Algebraic with Update Ihau rD,d(rA) 
Load Half Word Algebraic with Update Indexed Ihaux rD,rA,rB 
Load Word and Zero lwz rD,d(rA) 
Load Word and Zero Indexed lwzx rD,rA,rB 
Load Word and Zero with Update Iwzu rD,d(rA) 
Load Word and Zero with Update Indexed lwzux rD,rA,rB 














2.3.4.3.4 Integer Store Instructions 


For integer store instructions, the contents of rS are stored into the byte, half word, word or 
double word in memory addressed by the EA (effective address). Many store instructions 
have an update form, in which rA is updated with the EA. For these forms, the following 
rules apply: 


e IfrA #0, the effective address is placed into rA. 
e IfrS=rA, the contents of register rS are copied to the target memory element, then 
the generated EA is placed into rA (rS). 


The PowerPC architecture defines store with update instructions with rA = 0 as an invalid 
form. In addition, it defines integer store instructions with the CR update option enabled 
(Re field, bit 31, in the instruction encoding = 1) to be an invalid form. Table 2-34 
summarizes the integer store instructions. 


Table 2-34. Integer Store Instructions 





















































Name Mnemonic Syntax 
Store Byte stb rS,d(rA) 
Store Byte Indexed stbx rS,rA,rB 
Store Byte with Update stbu rS,d(rA) 
Store Byte with Update Indexed stbux rs, tA rB 
Store Half Word sth rS,d(rA) 
Store Half Word Indexed sthx rS,rA,rB 
Store Half Word with Update sthu rS,d(rA) 
Store Half Word with Update Indexed sthux rS,rA,rB 
Store Word stw rS,d(rA) 
Store Word Indexed stwx rS,rA,rB 
Store Word with Update stwu rS,d(rA) 
Store Word with Update Indexed stwux rS,rA,rB 
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2.3.4.3.5 Integer Store Gathering 


The MPC750 performs store gathering for write-through accesses to nonguarded space or 
to cache-inhibited stores to nonguarded space if the stores are 4 bytes and they are 
word-aligned. These stores are combined in the load/store unit (LSU) to form a double 
word and are sent out on the 60x bus as a single-beat operation. However, stores can be 
gathered only if the successive stores that meet the criteria are queued and pending. Store 
gathering takes place regardless of the address order of the stores. The store gathering 
feature is enabled by setting HIDO[SGE]. Store gathering is done for both big- and 
little-endian modes. 


Store gathering is not done for the following: 
* Cacheable stores 
e Stores to guarded cache-inhibited or write-through space 
¢ Byte-reverse store 
° stwex. and ecowx accesses 
¢ Floating-point stores 
e Store operations attempted during a hardware table search 


If store gathering is enabled and the stores do not fall under the above categories, an eieio 
or sync instruction must be used to prevent two stores from being gathered. 


2.3.4.3.6 Integer Load and Store with Byte-Reverse Instructions 


Table 2-35 describes integer load and store with byte-reverse instructions. When used in a 
system operating with the default big-endian byte order, these instructions have the effect 
of loading and storing data in little-endian order. Likewise, when used in a system operating 
with little-endian byte order, these instructions have the effect of loading and storing data 
in big-endian order. For more information about big-endian and little-endian byte ordering, 
see “Byte Ordering,’ in Chapter 3, “Operand Conventions,’ in the Programming 
Environments Manual. 


Table 2-35. Integer Load and Store with Byte-Reverse Instructions 





Mnemonic 


Load Half Word Byte-Reverse Indexed | thbrx | rDyrArB 


Load Word Byte-Reverse Indexed | wbx frDrArB | 
Store Half Word Byte-Reverse Indexed | sthbrx —|rSyrArB | 
Store Word Byte-Reverse Indexed | stwrx [FSB 


2.3.4.3.7. Integer Load and Store Multiple Instructions 





The load/store multiple instructions are used to move blocks of data to and from the GPRs. 
The load multiple and store multiple instructions may have operands that require memory 
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accesses crossing a 4-Kbyte page boundary. As a result, these instructions may be 
interrupted by a DSI exception associated with the address translation of the second page. 


Implementation Notes—The following describes the MPC750 implementation of the 
load/store multiple instruction: 


¢ For load/store string operations, the hardware does not combine register values to 
reduce the number of discrete accesses. However, if store gathering is enabled and 
the accesses fall under the criteria for store gathering the stores may be combined to 
enhance performance. At a minimum, additional cache access cycles are required. 


¢ The MPC750 supports misaligned, single-register load and store accesses in 
little-endian mode without causing an alignment exception. However, execution of 
misaligned load/store multiple/string operations causes an alignment exception. 


The PowerPC architecture defines the load multiple word (Imw) instruction with rA in the 
range of registers to be loaded as an invalid form. 


Table 2-36. Integer Load and Store Multiple Instructions 























Name Mnemonic Syntax 
Load Multiple Word Imw rD,d(rA) 
Store Multiple Word stmw rS,d(rA) 





2.3.4.3.8 Integer Load and Store String Instructions 


The integer load and store string instructions allow movement of data from memory to 
registers or from registers to memory without concern for alignment. These instructions can 
be used for a short move between arbitrary memory locations or to initiate a long move 
between misaligned memory fields. However, in some implementations, these instructions 
are likely to have greater latency and take longer to execute, perhaps much longer, than a 
sequence of individual load or store instructions that produce the same results. Table 2-37 
summarizes the integer load and store string instructions. 


In other implementations operating with little-endian byte order, execution of a load or 
string instruction invokes the alignment error handler; see “Byte Ordering,” in the 
Programming Environments Manual for more information. 


Table 2-37. Integer Load and Store String Instructions 

















Name Mnemonic Syntax 
Load String Word Immediate Iswi rD,rA,NB 
Load String Word Indexed Iswx rD,rA,rB 
Store String Word Immediate stswi rS,rA,NB 
Store String Word Indexed stswx rS,rA,rB 

















Load string and store string instructions may involve operands that are not word-aligned. 
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As described in Section 4.5.6, “Alignment Exception (0x00600),” a misaligned string 
operation suffers a performance penalty compared to an aligned operation of the same type. 
A non-word-aligned string operation that crosses a 4-Kbyte boundary, or a word-aligned 
string operation that crosses a 256-Mbyte boundary always causes an alignment exception. 
A non-word-aligned string operation that crosses a double-word boundary is also slower 
than a word-aligned string operation. 


Implementation Note—The following describes the MPC750 implementation of 
load/store string instructions: 


e For load/store string operations, the hardware does not combine register values to 
reduce the number of discrete accesses. However, if store gathering is enabled and 
the accesses fall under the criteria for store gathering the stores may be combined to 
enhance performance. At a minimum, additional cache access cycles are required. 


e The MPC750 supports misaligned, single-register load and store accesses in 
little-endian mode without causing an alignment exception. However, execution of 
misaligned load/store multiple/string operations cause an alignment exception. 


2.3.4.3.9 Floating-Point Load and Store Address Generation 


Floating-point load and store operations generate effective addresses using the register 
indirect with immediate index addressing mode and register indirect with index addressing 
mode. Floating-point loads and stores are not supported for direct-store accesses. The use 
of floating-point loads and stores for direct-store access results in an alignment exception. 


There are two forms of the floating-point load instruction—single-precision and 
double-precision operand formats. Because the FPRs support only the floating-point 
double-precision format, single-precision floating-point load instructions convert 
single-precision data to double-precision format before loading an operand into an FPR. 


Implementation Notes—The MPC750 treats exceptions as follows: 


¢ The FPU can be run in two different modes—ignore exceptions mode (MSR[FEO] = 
MSR[FE1] = 0) and precise mode (any other settings for MSR[FEO,FE1]). For the 
MPC750, ignore exceptions mode allows floating-point instructions to complete 
earlier and thus may provide better performance than precise mode. 


¢ The floating-point load and store indexed instructions (Ifsx, lfsux, lfdx, lfdux, stfsx, 
stfsux, stfdx, stfdux) are invalid when the Rc bit is one. In the MPC750, executing 
one of these invalid instruction forms causes CRO to be set to an undefined value. 


The PowerPC architecture defines a load with update instruction with rA = 0 as an invalid 
form. Table 2-38 summarizes the floating-point load instructions. 
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Table 2-38. Floating-Point Load Instructions 





Name 
Load Floating-Point Single 
Load Floating-Point Single Indexed 
Load Floating-Point Single with Update 
Load Floating-Point Single with Update Indexed 
Load Floating-Point Double 
Load Floating-Point Double Indexed 
Load Floating-Point Double with Update 
Load Floating-Point Double with Update Indexed 








2.3.4.3.10 Floating-Point Store Instructions 


This section describes floating-point store instructions. There are three basic forms of the 
store instruction—single-precision, double-precision, and integer. The integer form is 
supported by the optional stfiwx instruction. Because the FPRs support only floating-point, 
double-precision format for floating-point data, single-precision floating-point store 
instructions convert double-precision data to single-precision format before storing the 


Mnemonic 
lfs 
lfsx 
lfsu 
Ifsux 
lfd 
lfdx 
lfdu 
Ifdux 





Syntax 
frD,d(rA) 
frD,rA,rB 
frD,d(rA) 
frD,rA,rB 
frD,d(rA) 
frD,rA,rB 
frD,d(rA) 
frD,rA,rB 


operands. Table 2-39 summarizes the floating-point store instructions. 


Table 2-39. Floating-Point Store Instructions 






































Notes: 


1 The MPC750 and MPC755 require that the FPRs be initialized with oating-point v alues before the stfd 
instruction is used. Otherwise, a random power-on value for an FPR may cause unpredictable device 
behavior when the stfd instruction is executed. Note that any oating-point v alue loaded into the FPRs is 


acceptable. 


2 The stfiwx instruction is optional to the PowerPC architecture. 


Some floating-point store instructions require conversions in the LSU. Table 2-40 shows 
conversions the LSU makes when executing a Store Floating-Point Single instruction. 





Name Mnemonic Syntax 
Store Floating-Point Single stfs frS,d(rA) 
Store Floating-Point Single Indexed stfsx frS,rB 
Store Floating-Point Single with Update stfsu frS,d(rA) 
Store Floating-Point Single with Update Indexed stfsux frS,rB 
Store Floating-Point Double stid 1 frS,d(rA) 
Store Floating-Point Double Indexed stfdx frS,rB 
Store Floating-Point Double with Update stfdu frS,d(rA) 
Store Floating-Point Double with Update Indexed stfdux frS,rB 
Store Floating-Point as Integer Word Indexed 2 stfiwx frS,rB 
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Table 2-40. Store Floating-Point Single Behavior 























FPR Precision Data Type Action 
Single Normalized Store 
Single Denormalized Store 
Single Zero, in nity, QNaN Store 
Single SNaN Store 
Double Normalized If(exp < 896) 
then Denormalize and Store 
else 
Store 
Double Denormalized Store zero 
Double Zero, in nity, QNaN Store 
Double SNaN Store 








Table 2-41 shows the conversions made when performing a Store Floating-Point Double 
instruction. Most entries in the table indicate that the floating-point value is simply stored. 
Only in a few cases are any other actions taken. 


Table 2-41. Store Floating-Point Double Behavior 






































FPR Precision Data Type Action 
Single Normalized Store 
Single Denormalized Normalize and Store 
Single Zero, in nity, QNaN Store 
Single SNaN Store 
Double Normalized Store 
Double Denormalized Store 
Double Zero, in nity, QNaN Store 
Double SNaN Store 
Architecturally, all floating-point numbers are represented in double-precision format 





within the MPC750. Execution of a store floating-point single (stfs, stfsu, stfsx, stfsux) 
instruction requires conversion from double- to single-precision format. If the exponent is 
not greater than 896, this conversion requires denormalization. The MPC750 supports this 
denormalization by shifting the mantissa one bit at a time. Anywhere from | to 23 clock 
cycles are required to complete the denormalization, depending upon the value to be stored. 


Because of how floating-point numbers are implemented in the MPC750, there is also a 
case when execution of a store floating-point double (stfd, stfdu, stfdx, stfdux) instruction 
can require internal shifting of the mantissa. This case occurs when the operand of a store 
floating-point double instruction is a denormalized single-precision value. The value could 
be the result of a load floating-point single instruction, a single-precision arithmetic 
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instruction, or a floating round to single-precision instruction. In these cases, shifting the 
mantissa takes from 1 to 23 clock cycles, depending upon the value to be stored. These 
cycles are incurred during the store. 


2.3.4.4 Branch and Flow Control Instructions 


Some branch instructions can redirect instruction execution conditionally based on the 
value of bits in the CR. When the processor encounters one of these instructions, it scans 
the execution pipelines to determine whether an instruction in progress may affect the 
particular CR bit. If no interlock is found, the branch can be resolved immediately by 
checking the bit in the CR and taking the action defined for the branch instruction. 


2.3.4.4.1_ Branch Instruction Address Calculation 


Branch instructions can alter the sequence of instruction execution. Instruction addresses 
are always assumed to be word aligned; the processors ignore the two low-order bits of the 
generated branch target address. 


Branch instructions compute the EA of the next instruction address using the following 
addressing modes: 


¢ Branch relative 

¢ Branch conditional to relative address 
e Branch to absolute address 

¢ Branch conditional to absolute address 
¢ Branch conditional to link register 

¢ Branch conditional to count register 


Note that in the MPC750, all branch instructions (b, ba, bl, bla, be, bea, bel, bela, belr, 
belrl, bectr, bectrl) and condition register logical instructions (crand, cror, crxor, crnand, 
crnor, crandc, creqv, crorc, and merf) are executed by the BPU. Some of these 
instructions can redirect instruction execution conditionally based on the value of bits in the 
CR. Whenever the CR bits resolve, the branch direction is either marked as correct or 
mispredicted. Correcting a mispredicted branch requires that the MPC750 flush 
speculatively executed instructions and restore the machine state to immediately after the 
branch. This correction can be done immediately upon resolution of the condition registers 
bits. 


2.3.4.4.2. Branch Instructions 


Table 2-42 lists the branch instructions provided by the processors of this family. To 
simplify assembly language programming, a set of simplified mnemonics and symbols is 
provided for the most frequently used forms of branch conditional, compare, trap, rotate 
and shift, and certain other instructions. See Appendix F, “Simplified Mnemonics,” in the 
Programming Environments Manual for a list of simplified mnemonic examples. 
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Table 2-42. Branch Instructions 

















Name Mnemonic Syntax 
Branch b (ba bl bla) target_addr 
Branch Conditional be (bea bcl bela) BO,Bl,target_addr 
Branch Conditional to Link Register belr (belrl) BO,BI 
Branch Conditional to Count Register bectr (bcctrl) BO,BI 


2.3.4.4.3 Condition Register Logical Instructions 


Condition register logical instructions, shown in Table 2-43, and the Move Condition 
Register Field (merf) instruction are also defined as flow control instructions. 


Table 2-43. Condition Register Logical Instructions 









































Name Mnemonic Syntax 
Condition Register AND crand crbD,crbA,crbB 
Condition Register OR cror crbD,crbA,crbB 
Condition Register XOR crxor crbD,crbA,crbB 
Condition Register NAND crnand crbD,crbA,crbB 
Condition Register NOR crnor crbD,crbA,crbB 
Condition Register Equivalent creqv crbD,crbA, crbB 
Condition Register AND with Complement crandc crbD,crbA, crbB 
Condition Register OR with Complement crorc crbD,crbA, crbB 
Move Condition Register Field merf crfD,crfS 








Note that if the LR update option is enabled for any of these instructions, the PowerPC 
architecture defines these forms of the instructions as invalid. 


2.3.4.4.4 Trap Instructions 


The trap instructions shown in Table 2-44 are provided to test for a specified set of 
conditions. If any of the conditions tested by a trap instruction are met, the system trap type 
program exception is taken. For more information, see Section 4.5.7, “Program Exception 
(Ox00700).” If the tested conditions are not met, instruction execution continues normally. 


Table 2-44. Trap Instructions 














Name Mnemonic Syntax 
Trap Word Immediate twi TO,rA,SIMM 
Trap Word tw TO,rA,rB 











See Appendix F, “Simplified Mnemonics,” in the Programming Environments Manual for 
a complete set of simplified mnemonics. 
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2.3.4.5 System Linkage Instruction—UISA 


The System Call (sc) instruction permits a program to call on the system to perform a 
service; see Table 2-45. See also Section 2.3.6.1, “System Linkage Instructions—OEA,” 
for additional information. 


Table 2-45. System Linkage Instruction—UISA 





Name Mnemonic Syntax 





System Call sc — 

















Executing this instruction causes the system call exception handler to be evoked. For more 
information, see Section 4.5.10, “System Call Exception (Ox00C00).” 


2.3.4.6 Processor Control Instructions—UISA 


Processor control instructions are used to read from and write to the condition register 
(CR), machine state register (MSR), and special-purpose registers (SPRs). See 
Section 2.3.5.1, “Processor Control Instructions—VEA,” for the mftb instruction and 
Section 2.3.6.2, “Processor Control Instructions—OEA,” for information about the 
instructions used for reading from and writing to the MSR and SPRs. 


2.3.4.6.1. Move to/from Condition Register Instructions 
Table 2-46 summarizes the instructions for reading from or writing to the condition register. 


Table 2-46. Move to/from Condition Register Instructions 














Name Mnemonic Syntax 
Move to Condition Register Fields mtcrf CRM,rs 
Move to Condition Register from XER merxr crfD 
Move from Condition Register mfcr rD 











Implementation Note—The PowerPC architecture indicates that in some implementations 
the Move to Condition Register Fields (mterf) instruction may perform more slowly when 
only a portion of the fields are updated as opposed to all of the fields. The condition register 
access latency for the MPC750 is the same in both cases. 


2.3.4.6.2 Move to/from Special-Purpose Register Instructions (UISA) 
Table 2-47 lists the mtspr and mfspr instructions. 


Table 2-47. Move to/from Special-Purpose Register Instructions (UISA) 











Name Mnemonic Syntax 
Move to Special-Purpose Register mtspr SPR,rS 
Move from Special-Purpose Register mfspr rD,SPR 
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Table 2-48 lists the SPR numbers for both user- and supervisor-level accesses. 
Table 2-48. PowerPC Encodings 





Register Name 


CTR 
DABR 
DAR 
DBATOL 
DBATOU 
DBAT1L 
DBAT1U 
DBAT2L 
DBAT2U 
DBAT3L 
DBAT3U 
DEC 
DSISR 
EAR 
IBATOL 
IBATOU 
IBAT1L 
IBAT1U 
IBAT2L 
IBAT2U 
IBAT3L 
IBAT3U 
LR 
PVR 
SDR1 
SPRGO 
SPRG1 
SPRG2 
SPRG3 
SRRO 
SRR1 





1 








SPR 
Decimal spr[5—9] spr[0—4] 
9 00000 01001 
1013 11111 10101 
19 00000 10011 
537 10000 11001 
536 10000 11000 
539 10000 11011 
538 10000 11010 
541 10000 11101 
540 10000 11100 
543 10000 11111 
542 10000 11110 
22 00000 10110 
18 00000 10010 
282 01000 11010 
529 10000 10001 
528 10000 10000 
531 10000 10011 
530 10000 10010 
533 10000 10101 
532 10000 10100 
535 10000 10111 
534 10000 10110 
8 00000 01000 
287 01000 11111 
25 00000 11001 
272 01000 10000 
273 01000 10001 
274 01000 10010 
275 01000 10011 
26 00000 11010 
27 00000 11011 








Access 


User (UISA) 

Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 
Supervisor (OEA 


) 
) 
) 
) 
) 
) 
) 
) 
) 
) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
User (UISA) 

Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
Supervisor (OEA) 
) 


Supervisor (OEA 
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mfspr/mtspr 


Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
mfspr 
Both 
Both 
Both 
Both 
Both 
Both 
Both 
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Table 2-48. PowerPC Encodings (continued) 




















SPR‘ 
Register Name Access mfspr/mtspr 
Decimal spr[5—9] spr[0-—4] 
TBL 2 268 01000 01100 Supervisor (OEA) | mtspr 
284 01000 11100 Supervisor (OEA) | mtspr 
TBU 2 269 01000 01101 Supervisor (OEA) | mtspr 
285 01000 11101 Supervisor (OEA) | mtspr 
XER 1 00000 00001 User (UISA) Both 























Notes: 


1 The order of the two 5-bit halves of the SPR number is reversed compared with actual instruction coding. For mtspr 


and mfspr instructions, the SPR number coded in assembly language does not appear directly as a 10-bit binary 
number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with 
the high-order ve bits appearing in bits 16—20 of the instruction and the low-order ve bits in bits 11-15. 

2 The TB registers are referred to as TBRs rather than SPRs and can be written to using the mtspr instruction in 
supervisor mode and the TBR numbers here. The TB registers can be read in user mode using either the mftb or 
mtspr instruction and specifying TBR 268 for TBL and TBR 269 for TBU. 


Encodings for the MPC750-specific SPRs are listed in Table 2-49. 
Table 2-49. SPR Encodings for MPC750-Defined Registers (mfspr) 
























































SPR’ 
Register Name Access mfspr/mtspr 
Decimal spr[5—9] spr[0-4] 
DABR 1013 11111 10101 User Both 
HIDO 1008 11111 10000 Supervisor Both 
HID1 1009 11111 10001 Supervisor Both 
IABR 1010 11111 10010 Supervisor Both 
ICTC 1019 11111 11011 Supervisor Both 
L2CR 1017 11111 11001 Supervisor Both 
MMCRO 952 11101 11000 Supervisor Both 
MMCR1 956 11101 11100 Supervisor Both 
PMC1 953 11101 11001 Supervisor Both 
PMC2 954 11101 11010 Supervisor Both 
PMC3 957 11101 11101 Supervisor Both 
PMC4 958 11101 11110 Supervisor Both 
SIA 955 11101 11011 Supervisor Both 
THRM1 1020 11111 11100 Supervisor Both 
THRM2 1021 11111 11101 Supervisor Both 
THRM3 1022 11111 11110 Supervisor Both 
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Table 2-49. SPR Encodings for MPC750-Defined Registers (mfspr) (continued) 






































SPR’ 
Register Name Access mfspr/mtspr 
Decimal spr[5—9] spr[0—4] 
UMMCRO 936 11101 01000 User mfspr 
UMMCR1 940 11101 01100 User mfspr 
UPMC1 937 11101 01001 User mfspr 
UPMC2 938 11101 01010 User mfspr 
UPMC3 941 11101 01101 User mfspr 
UPMC4 942 11101 01110 User mfspr 
USIA 939 11101 01011 User mfspr 











Note: 


Note that the order of the two 5-bit halves of the SPR number is reversed compared with actual instruction coding. 
For mtspr and mfspr instructions, the SPR number coded in assembly language does not appear directly as a 10-bit 
binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, 
with the high-order 5 bits appearing in bits 16—20 of the instruction and the low-order 5 bits in bits 11-15. 


2.3.4.7. Memory Synchronization Instructions—UISA 


Memory synchronization instructions control the order in which memory operations are 
completed with respect to asynchronous events, and the order in which memory operations 
are seen by other processors or memory access mechanisms. See Chapter 3, “L1 Instruction 
and Data Cache Operation,” for additional information about these instructions and about 
related aspects of memory synchronization. See Table 2-50 for a summary. 


Table 2-50. Memory Synchronization Instructions—UISA 








Name Mnemonic} Syntax Implementation Notes 
Load Word lwarx rD,rA,rB | Programmers can use lwarx with stwex. to emulate common semaphore 
and Reserve operations such as test and set, compare and swap, exchange memory, and 
Indexed fetch and add. Both instructions must use the same EA. Reservation granularity 





is implementation-dependent. The MPC750 makes reservations on behalf of 
aligned 32-byte sections of the memory address space. If the W bit is set, 
executing lwarx and stwcx. to a page marked write-through does not cause a 
DSI exception, but DSI exceptions can result for other reasons. If the location is 
not word-aligned, an alignment exception occurs. 

The stwex. instruction is the only load/store instruction with a valid form if Rc is 
set. If Re is zero, executing stwcx. sets CRO to an undefined value. In 
general, stwex. always causes a transaction on the external bus and thus 
operates with slightly worse performance characteristics than normal store 
operations. 


Store Word stwcx. rsS,rA,rB 
Conditional 
Indexed 
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Table 2-50. Memory Synchronization Instructions—UISA (continued) 





Name Mnemonic | Syntax Implementation Notes 





Synchronize sync — Because it delays subsequent instructions until all previous instructions 
complete to where they cannot cause an exception, sync is a barrier against 
store gathering. Additionally, all load/store cache/bus activities initiated by prior 
instructions are completed. Touch load operations (debt, dcbtst) must complete 
address translation, but need not complete on the bus. If HIDO[ABE] = 1, syne 
completes after a successful broadcast. 

The latency of syne depends on the processor state when it is dispatched and 
on various system-level situations. Therefore, frequent use of sync may 
degrade performance. 

















System designs with an L2 cache should take special care to recognize the hardware 
signaling caused by a SYNC bus operation and perform the appropriate actions to 
guarantee that memory references that may be queued internally to the L2 cache have been 
performed globally. 


See 2.3.5.2, “Memory Synchronization Instructions—VEA,” for details about additional 
memory synchronization (eieio and isync) instructions. 


In the PowerPC architecture, the Rc bit must be zero for most load and store instructions. 
If Rc is set, the instruction form is invalid for syne and Iwarx instructions. If the MPC750 
encounters one of these invalid instruction forms, it sets CRO to an undefined value. 


2.3.5 PowerPC VEA Instructions 


The PowerPC virtual environment architecture (VEA) describes the semantics of the 
memory model that can be assumed by software processes, and includes descriptions of the 
cache model, cache control instructions, address aliasing, and other related issues. 
Implementations that conform to the VEA also adhere to the UISA, but may not necessarily 
adhere to the OFA. 


This section describes additional instructions that are provided by the VEA. 


2.3.5.1. Processor Control Instructions—VEA 


In addition to the move to condition register instructions (specified by the UISA), the VEA 
defines the mftb instruction (user-level instruction) for reading the contents of the time base 
register; see Chapter 3, “L1 Instruction and Data Cache Operation,’ for more information. 
Table 2-51 shows the mftb instruction. 


Table 2-51. Move from Time Base Instruction 





Name Mnemonic Syntax 





Move from Time Base mftb rD, TBR 














Simplified mnemonics are provided for the mftb instruction so it can be coded with the 
TBR name as part of the mnemonic rather than requiring it to be coded as an operand. See 
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Appendix F, “Simplified Mnemonics,” in the Programming Environments Manual for 
simplified mnemonic examples and for simplified mnemonics for Move from Time Base 
(mftb) and Move from Time Base Upper (mftbu), which are variants of the mftb 
instruction rather than of mfspr. The mftb instruction serves as both a basic and simplified 
mnemonic. Assemblers recognize an mftb mnemonic with two operands as the basic form, 
and an mftb mnemonic with one operand as the simplified form. Note that the MPC750 
ignores the extended opcode differences between mftb and mfspr by ignoring bit 25 and 
treating both instructions identically. 


Implementation Notes—The following information is useful with respect to using the 
time base implementation in the MPC750: 


¢ The MPC750 allows user-mode read access to the time base counter through the use 
of the Move from Time Base (mftb) and the Move from Time Base Upper (mftbu) 
instructions. As a 32-bit implementation, the MPC750 can access TBU and TBL 
only separately, whereas 64-bit implementations can access the entire TB register at 
once. 


e The time base counter is clocked at a frequency that is one-fourth that of the bus 
clock. Counting is enabled by assertion of the time base enable (TBE) input signal. 


2.3.5.2 Memory Synchronization Instructions—VEA 


Memory synchronization instructions control the order in which memory operations are 
completed with respect to asynchronous events, and the order in which memory operations 
are seen by other processors or memory access mechanisms. See Chapter 3, “L1 Instruction 
and Data Cache Operation,” for more information about these instructions and about related 
aspects of memory synchronization. 


In addition to the sync instruction (specified by UISA), the VEA defines the Enforce 
In-Order Execution of I/O (eieio) and Instruction Synchronize (isync) instructions. The 
number of cycles required to complete an eieio instruction depends on system parameters 
and on the processor's state when the instruction is issued. As a result, frequent use of this 
instruction may degrade performance slightly. 


Table 2-52 describes the memory synchronization instructions defined by the VEA. 
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Table 2-52. Memory Synchronization Instructions—VEA 





Name Mnemonic | Syntax Implementation Notes 
Enforce eieio — |The eieio instruction is dispatched to the LSU and executes after all previous 
In-Order cache-inhibited or write-through accesses are performed; all subsequent 
Execution of instructions that generate such accesses execute after eieio. If HIDO[ABE] = 1 an 
VO EIEIO operation is broadcast on the external bus to enforce ordering in the 


external memory system. The eieio operation bypasses the L2 cache and is 
forwarded to the bus unit. If HIDO[ABE] = 0, the operation is not broadcast. 
Because the MPC750 does not reorder noncacheable accesses, eieio is not 
needed to force ordering. However, if store gathering is enabled and an eieio is 
detected in a store queue, stores are not gathered. If HIDO[ABE] = 1, broadcasting 
eieio prevents external devices, such as a bus bridge chip, from gathering stores. 





Instruction isync — | The isync instruction is refetch serializing; that is, it causes the MPC750 to purge 
Synchronize its instruction queue and wait for all prior instructions to complete before refetching 
the next instruction, which is not executed until all previous instructions complete 
to the point where they cannot cause an exception. The isync instruction does not 
wait for all pending stores in the store queue to complete. Any instruction after an 
isync sees all effects of prior instructions. 

















2.3.5.3 Memory Control Instructions—VEA 
Memory control instructions can be classified as follows: 


¢ Cache management instructions (user-level and supervisor-level) 
e Segment register manipulation instructions (OEA) 
¢ Translation lookaside buffer management instructions (OEA) 


This section describes the user-level cache management instructions defined by the VEA. 
See Section 2.3.6.3, “Memory Control Instructions—OEA,” for information about 
supervisor-level cache, segment register manipulation, and translation lookaside buffer 
management instructions. 


2.3.5.3.1_ User-Level Cache Instructions—VEA 


The instructions summarized in this section help user-level programs manage on-chip 
caches if they are implemented. See Chapter 3, “L1 Instruction and Data Cache Operation,” 
for more information about cache topics. The following sections describe how these 
operations are treated with respect to the MPC750’s cache. 


As with other memory-related instructions, the effects of cache management instructions 
on memory are weakly-ordered. If the programmer must ensure that cache or other 
instructions have been performed with respect to all other processors and system 
mechanisms, a sync instruction must be placed after those instructions. 


Note that the MPC750 interprets cache control instructions (icbi, debi, dcbf, dcebz, and 
dcbst) as if they pertain only to the local L1 and L2 cache. A debz (with M set) is always 
broadcast on the 60x bus. The debi, dcbf, and dcbst operations are broadcast if 
HIDO[ABE] is set. 
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The MPC750 never broadcasts an icbi. Of the broadcast cache operations, the MPC750 
snoops only dcbz, regardless of the HIDO[ABE] setting. Any bus activity caused by other 
cache instructions results directly from performing the operation on the MPC750 cache. All 
cache control instructions to T = 1 space are no-ops. For information how cache control 
instructions affect the L2, see Chapter 9, “L2 Cache Interface Operation.” 


Table 2-53 summarizes the cache instructions defined by the VEA. Note that these 
instructions are accessible to user-level programs. 


Name 


Table 2-53. User-Level Cache Instructions 


Mnemonic 


Syntax 


Implementation Notes 





Data Cache Block 
Touch 1 


dcbt 


rA,rB 


The VEA de nes this instr uction to allow for potential system performance 
enhancements through the use of software-initiated prefetch hints. 
Implementations are not required to take any action based on execution of 
this instruction, but they may prefetch the cache block corresponding to the 
EA into their cache. When debt executes, the MPC750 checks for 
protection violations (as for a load instruction). This instruction is treated as 
a no-op for the following cases: 

« ¢A valid translation is not found either in BAT or TLB 

* *The access causes a protection violation. 

« * The page is mapped cache-inhibited, G = 1 (guarded), or T = 1. 

* «The cache is locked or disabled 

¢ «+ HIDO[NOOPTI] = 1 

Otherwise, if no data is in the cache location, the MPC750 requests a 
cache line II (with intent to modify). Data brought into the cache is 
validated as if it were a load instruction. The memory reference of a dcbt 
sets the reference bit. 








Data Cache Block 
Touch for Store 1 


Data Cache Block 
Set to Zero 





dcbtst 


dcbz 





rA,rB 


rA,rB 





This instruction behaves like debt. 


The EA is computed, translated, and checked for protection violations. For 
cache hits, four beats of zeros are written to the cache block and the tag is 
marked M. For cache misses with the replacement block marked E, the 
zero line Il is perf ormed and the cache block is marked M. However, if the 
replacement block is marked M, the contents are written back to memory 
rst. The instruction executes regardless of whether the cache is locked; if 
the cache is disabled, an alignment exception occurs. If M = 1 (coherency 
enforced), the address is broadcast to the bus before the zero line Il. 

The exception priorities (from highest to lowest) are as follows: 

1 Cache disabled—Alignment exception 

2 Page marked write-through or cache Inhibited—Alignment exception 

3 BAT protection violation—DSI exception 

4 TLB protection violation—DSI exception 

dcbz is the only cache instruction that broadcasts even if HIDO[ABE] = 0. 
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Table 2-53. User-Level Cache Instructions (continued) 





Name Mnemonic | Syntax Implementation Notes 
Data Cache Block dcbst rA,rB_ | The EA is computed, translated, and checked for protection violations. 
Store ¢ *For cache hits with the tag marked E, no further action is taken. 


¢ ¢For cache hits with the tag marked M, the cache block is written back to 
memory and marked E. 

A debst is not broadcast unless HIDO[ABE] = 1 regardless of WIMG 

settings. The instruction acts like a load with respect to address translation 

and memory protection. It executes regardless of whether the cache is 

disabled or locked. 

The exception priorities (from highest to lowest) for dcbst are as follows: 

¢ 1BAT protection violation—DSI exception 

¢ 2TLB protection violation—DSI exception 


Data Cache Block dcbf rA,rB_ | The EA is computed, translated, and checked for protection violations. 

Flush ¢ *For cache hits with the tag marked M, the cache block is written back to 
memory and the cache entry is invalidated. 

¢ *For cache hits with the tag marked E, the entry is invalidated. 

¢ For cache misses, no further action is taken. 

A debf is not broadcast unless HIDO[ABE] = 1 regardless of WIMG 

settings. The instruction acts like a load with respect to address translation 

and memory protection. It executes regardless of whether the cache is 

disabled or locked. 

The exception priorities (from highest to lowest) for dcbf are as follows: 

* 1BAT protection violation—DSI exception 

¢ 2TLB protection violation—DSI exception 








Instruction Cache icbi rA,tB_ | This instruction performs a virtual lookup into the instruction cache (index 
Block Invalidate only). The address is not translated, so it cannot cause an exception. All 
ways of a selected set are invalidated regardless of whether the cache is 
disabled or locked. The MPC750 never broadcasts icbi onto the 60x bus. 














Note: 


1 A program that uses debt and debtst instructions improperly performs less ef ciently. To improve performance, 
HIDO[NOOPTI] may be set, which causes debt and dcbtst to be no-oped at the cache. They do not cause bus activity 
and cause only a 1-clock execution latency. The default state of this bit is zero which enables the use of these 
instructions. 


2.3.5.4 Optional External Control Instructions 


The PowerPC architecture defines an optional external control feature that, if implemented, 
is supported by the two external control instructions, eciwx and ecowx. These instructions 
allow a user-level program to communicate with a special-purpose device. These 
instructions are provided and are summarized in Table 2-54. 
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Table 2-54. External Control Instructions 





Name Mnemonic} Syntax Implementation Notes 
External eciwx rD,rA,rB | A transfer size of 4 bytes is implied; the TBST and TSIZ[0—2] signals are 
Control In rede ned to specify the Resource ID (RID), copied from bits EAR[28—31]. For 
Word Indexed these operations, TBST carries the EAR[28] data. Misaligned operands for 











these instructions cause an alignment exception. Addressing a location where 
SR[T] = 1 causes a DSI exception. If MSR[DR] = 0 a programming error occurs 
and the physical address on the bus is unde ned. 

Note: These instructions are optional to the PowerPC architecture. 


External ecowx rS,rA,rB 
Control Out 
Word Indexed 











The eciwx/ecowx instructions let a system designer map special devices in an alternative 
way. The MMU translation of the EA is not used to select the special device, as it is used 
in most instructions such as loads and stores. Rather, it is used as an address operand that 
is passed to the device over the address bus. Four other signals (the burst and size signals 
on the 60x bus) are used to select the device; these four signals output the 4-bit resource ID 
(RID) field located in the EAR. The eciwx instruction also loads a word from the data bus 
that is output by the special device. For more information about the relationship between 
these instructions and the system interface, refer to Chapter 7, “Signal Descriptions.” 


2.3.6 PowerPC OEA Instructions 


The PowerPC operating environment architecture (OEA) includes the structure of the 
memory management model, supervisor-level registers, and the exception model. 
Implementations that conform to the OEA also adhere to the UISA and the VEA. This 
section describes the instructions provided by the OEA. 


2.3.6.1 System Linkage Instructions—OEA 


This section describes the system linkage instructions (see Table 2-55). The user-level se 
instruction lets a user program call on the system to perform a service and causes the 
processor to take a system call exception. The supervisor-level rfi instruction is used for 
returning from an exception handler. 


Table 2-55. System Linkage Instructions—OEA 














Name Mnemonic | Syntax Implementation Notes 
System Call sc — The sc instruction is context-synchronizing. 
Return from rfi — The rfi instruction is context-synchronizing. For the MPC750, this means the 
Interrupt rfi instruction works its way to the nal stage of the e xecution pipeline, 
updates architected registers, and redirects the instruction ow. 











2.3.6.2 Processor Control Instructions—OEA 


This section describes the processor control instructions used to access the MSR and the 
SPRs. Table 2-56 lists instructions for accessing the MSR. 
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Table 2-56. Move to/from Machine State Register Instructions 





Name Mnemonic Syntax 





The OEA defines encodings of mtspr and mfspr to provide access to supervisor-level 
registers. The instructions are listed in Table 2-57. 


Table 2-57. Move to/from Special-Purpose Register Instructions (OEA) 








Name Mnemonic Syntax 
Move to Special-Purpose Register mtspr SPR,rS 
Move from Special-Purpose Register mfspr rD,SPR 

















Encodings for the architecture-defined SPRs are listed in Table 2-48. Encodings for 
MPC750-specific, supervisor-level SPRs are listed in Table 2-49. Simplified mnemonics 
are provided for mtspr and mfspr in Appendix F, “Simplified Mnemonics,” in the 
Programming Environments Manual. For a discussion of context synchronization 
requirements when altering certain SPRs, refer to Appendix E, “Synchronization 
Programming Examples,” in the Programming Environments Manual. 


2.3.6.3 Memory Control Instructions—OEA 


Memory control instructions include the following: 
e Cache management instructions (supervisor-level and user-level) 
e Segment register manipulation instructions 


¢ Translation lookaside buffer management instructions 
This section describes supervisor-level memory control instructions. Section 2.3.5.3, 
“Memory Control Instructions—VEA,” describes user-level memory control instructions. 
2.3.6.3.1 Supervisor-Level Cache Management Instruction—(OEA) 


Table 2-58 lists the only supervisor-level cache management instruction. 


Table 2-58. Supervisor-Level Cache Management Instruction 








Name _ | Mnemonic | Syntax Implementation Notes 
Data dcbi rA,rB_ | The EA is computed, translated, and checked for protection violations. For cache 
Cache hits, the cache block is marked | regardless of whether it was marked E or M. A debi 
Block is not broadcast unless HIDO[ABE] = 1, regardless of WIMG settings. The instruction 
Invalidate acts like a store with respect to address translation and memory protection. It 


executes regardless of whether the cache is disabled or locked. 

The exception priorities (from highest to lowest) for dcbi are as follows: 
1 BAT protection violation—DSI exception 

2 TLB protection violation—DSI exception 
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See Section 2.3.5.3.1, “User-Level Cache Instructions—VEA,” for cache instructions that 
provide user-level programs the ability to manage the on-chip caches. If the effective 
address references a direct-store segment, the instruction is treated as a no-op. 


2.3.6.3.2 Segment Register Manipulation Instructions (OEA) 


The instructions listed in Table 2-59 provide access to the segment registers for 32-bit 
implementations. These instructions operate completely independently of the MSR[IR] and 
MSR[DR] bit settings. Refer to “Synchronization Requirements for Special Registers and 
for Lookaside Buffers,’ in Chapter 2, “PowerPC Register Set,’ of the Programming 
Environments Manual for serialization requirements and other recommended precautions 
to observe when manipulating the segment registers. 


Table 2-59. Segment Register Manipulation Instructions 


























Name Mnemonic | Syntax Implementation Notes 
Move to Segment Register mtsr SRS |— 
Move to Segment Register Indirect mtsrin 1 rs,rB| — 
Move from Segment Register mfsr rD,SR_ | The shadow SRs in the instruction MMU can be read by 
setting HIDO[RISEG] before executing mfsr. 
Move from Segment Register Indirect mfsrin rD,rB | — 
Notes: 


1 The MPC750 and MPC755 have a restriction on the use of the mtsr and mtsrin instructions not described in the 


Programming Environments Manual.The MPC750 and MPC755 require that an isync instruction be executed after 
either an mtsr or mtsrin instruction. This isyne instruction must occur after the execution of the mtsr or mtsrin and 
before the data address translation mechanism uses any of the on-chip segment registers. 


2.3.6.3.3 Translation Lookaside Buffer Management Instructions—(OEA) 


The address translation mechanism is defined in terms of the segment descriptors and page 
table entries (PTEs) that the PowerPC architecture defines for locating the 
logical-to-physical address mapping for a particular access. These segment descriptors and 
PTEs reside in segment registers and page tables in memory, respectively. 


See Chapter 7, “Memory Management,” for more information about TLB operations. 
Table 2-60 summarizes the operation of the TLB instructions in the MPC750. 


Table 2-60. Translation Lookaside Buffer Management Instruction 











Name Mnemonic | Syntax Implementation Notes 
TLB tlbie rB Invalidates both ways in both instruction and data TLB entries at the index 
Invalidate provided by EA[14—19]. It executes regardless of the MSR[DR] and MSR[IR] 
Entry settings. To invalidate all entries in both TLBs, the programmer should issue 64 
tlbie instructions that each successively increment this eld. 
TLB tlbsync —_— On the MPC750, the only function tlbsync serves is to wait for the TLBISYNC 
Synchronize signal to go inactive. 
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Implementation Note—The tlbia instruction is optional for an implementation if its 
effects can be achieved through some other mechanism. Therefore, it is not implemented 
on the MPC750. As described above, tlbie can be used to invalidate a particular index of 
the TLB based on EA[14—19]—a sequence of 64 tlbie instructions followed by a tlbsync 
instruction invalidates all the TLB structures (for EA[14—19] = 0, 1, 2,..., 63). Attempting 
to execute tlbia causes an illegal instruction program exception. 


The presence and exact semantics of the TLB management instructions are 
implementation-dependent. To minimize compatibility problems, system software should 
incorporate uses of these instructions into subroutines. 


2.3.7 Recommended Simplified Mnemonics 


To simplify assembly language coding, a set of alternative mnemonics is provided for some 
frequently used operations (such as no-op, load immediate, load address, move register, and 
complement register). Programs written to be portable across the various assemblers for the 
PowerPC architecture should not assume the existence of mnemonics not described in this 
document. 


For a complete list of simplified mnemonics, see Appendix F, “Simplified Mnemonics,” in 
the Programming Environments Manual. 
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L1 Instruction and Data Cache Operation 


This chapter describes the on-chip instruction and data caches of the MPC750. Note that 
the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the 
MPC750 apply for the MPC755 except as noted in Appendix C, “MPC755 Embedded G3 
Microprocessor.” 


The MPC750 microprocessor contains separate 32-Kbyte, eight-way set associative 
instruction and data caches to allow the execution units and registers rapid access to 
instructions and data. This chapter describes the organization of the on-chip instruction and 
data caches, the MEI cache coherency protocol, cache control instructions, various cache 
operations, and the interaction between the caches, the load/store unit (LSU), the 
instruction unit, and the bus interface unit (BIU). 


Note that in this chapter, the term ‘multiprocessor’ is used in the context of maintaining 
cache coherency. These multiprocessor devices could be actual processors or other devices 
that can access system memory, maintain their own caches, and function as bus masters 
requiring cache coherency. 
The MPC750 cache implementation has the following characteristics: 
e There are two separate 32-Kbyte instruction and data caches (Harvard architecture). 
¢ Both instruction and data caches are eight-way set associative. 


e The caches implement a pseudo least-recently-used (PLRU) replacement algorithm 
within each set. 


e The cache directories are physically addressed. The physical (real) address tag is 
stored in the cache directory. 


¢ Both the instruction and data caches have 32-byte cache blocks. A cache block is the 
block of memory that a coherency state describes, also referred to as a cache line. 


¢ Two coherency state bits for each data cache block allow encoding for three states: 
— Modified (Exclusive) (M) 
— Exclusive (Unmodified) (E) 
— Invalid (1) 

e Asingle coherency state bit for each instruction cache block allows encoding for two 
possible states: 
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— Invalid (INV) 
— Valid (VAL) 


e Each cache can be invalidated or locked by setting the appropriate bits in the 
hardware implementation-dependent register 0 (HIDO), a special-purpose register 
(SPR) specific to the MPC750. 


The MPC750 supports a fully-coherent 4-Gbyte physical memory address space. Bus 
snooping is used to drive the MEI three-state cache coherency protocol that ensures the 
coherency of global memory with respect to the processor’s data cache. The MEI protocol 
is described in Section 3.3.2, “MEI Protocol.” 


On a cache miss, the MPC750’s cache blocks are filled in four beats of 64 bits each. The 
burst fill is performed as a critical-double-word-first operation; the critical double word is 
simultaneously written to the cache and forwarded to the requesting unit, thus minimizing 
stalls due to cache fill latency. 


The instruction and data caches are integrated into the MPC750 as shown in Figure 3-1. 


Load/Store Unit 
(LSU) 


Instruction Unit 























Instructions (0-127) EA (20-26) 


Cache Tags 
I-Cache 


32-Kbyte PA (0-19) 


8-Way Set Associative 
Cache Logic 


Instructions (0-63) PA (0-31) 


Data (0-63) 


Cache Tags 
D-Cache 
ea 32-Kbyte 
8-Way Set Associative 
Cache Logic 


Data (0-63) 











MMU/L2 BIU (MPC750 only)/60x BIU 








EA: Effective Address 
PA: Physical Address 


Figure 3-1. Cache Integration 


Both caches are tightly coupled to the MPC750’s bus interface unit to allow efficient access 
to the system memory controller and other bus masters. The bus interface unit receives 
requests for bus operations from the instruction and data caches, and executes the 
operations per the 60x bus protocol. The BIU provides address queues, prioritizing logic, 
and bus control logic. The BIU captures snoop addresses for data cache, address queue, and 
memory reservation (Iwarx and stwex. instruction) operations. 


The data cache provides buffers for load and store bus operations. All the data for the 
corresponding address queues (load and store data queues) is located in the data cache. The 
data queues are considered temporary storage for the cache and not part of the BIU. The 
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data cache also provides storage for the cache tags required for memory coherency and 
performs the cache block replacement PLRU function. 


The data cache supplies data to the GPRs and FPRs by means of the load/store unit. The 
MPC750’s LSU is directly coupled to the data cache to allow efficient movement of data to 
and from the general-purpose and floating-point registers. The load/store unit provides all 
logic required to calculate effective addresses, handles data alignment to and from the data 
cache, and provides sequencing for load and store string and multiple operations. Write 
operations to the data cache can be performed on a byte, half-word, word, or double-word 
basis. 


The instruction cache provides a 128-bit interface to the instruction unit, so four 
instructions can be made available to the instruction unit in a single clock cycle. The 
instruction unit accesses the instruction cache frequently in order to sustain the high 
throughput provided by the six-entry instruction queue. 


3.1. Data Cache Organization 


The data cache is organized as 128 sets of eight blocks as shown in Figure 3-2. Each block 
consists of 32 bytes, two state bits, and an address tag. Note that in the PowerPC 
architecture, the term ‘cache block, or simply ‘block,’ when used in the context of cache 
implementations, refers to the unit of memory at which coherency is maintained. For the 
MPC750, this is the eight-word cache line. This value may be different for other processors 
in the family. 


Each cache block contains eight contiguous words from memory that are loaded from an 
eight-word boundary (that is, bits A[27—31] of the logical (effective) addresses are zero); as 
a result, cache blocks are aligned with page boundaries. Note that address bits A[20—26] 
provide the index to select a cache set. Bits A[27—31] select a byte within a block. The two 
state bits implement a three-state MEI (modified/exclusive/invalid) protocol, a coherent 
subset of the standard four-state MESI (modified/exclusive/shared/invalid) protocol. The 
MEI protocol is described in Section 3.3.2, “MEI Protocol.” The tags consist of bits 
PA[0—19]. Address translation occurs in parallel with set selection (from A[20—26]), and the 
higher-order address bits (the tag bits in the cache) are physical. 


The MPC750’s on-chip data cache tags are single-ported, and load or store operations must 
be arbitrated with snoop accesses to the data cache tags. Load or store operations can be 
performed to the cache on the clock cycle immediately following a snoop access if the 
snoop misses; snoop hits may block the data cache for two or more cycles, depending on 
whether a copy-back to main memory is required. 
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Block 2 Address Tag 2 Words [0-7] 








Block 3 Address Tag 3 Words [0-7] 


Block 4 Address Tag 4 Words [0-7] 








Block 5 Address Tag 5 Words [0-7] 








Block 6 Address Tag 6 Words [0-7] 



































Block 7 Address Tag 7 Words [0-7] 








~ 8 Words/Block > 


Figure 3-2. Data Cache Organization 


3.2 Instruction Cache Organization 


The instruction cache also consists of 128 sets of eight blocks, as shown in Figure 3-3. Each 
block consists of 32 bytes, a single state bit, and an address tag. As with the data cache, each 
instruction cache block contains eight contiguous words from memory that are loaded from 
an eight-word boundary (that is, bits A[27—31] of the logical (effective) addresses are zero); 
as a result, cache blocks are aligned with page boundaries. Also, address bits A[20—26] 
provide the index to select a set, and bits A[27—29] select a word within a block. 


The tags consist of bits PA[O—19]. Address translation occurs in parallel with set selection 
(from A[20—26]), and the higher order address bits (the tag bits in the cache) are physical. 


The instruction cache differs from the data cache in that it does not implement MEI cache 
coherency protocol, and a single state bit is implemented that indicates only whether a 
cache block is valid or invalid. The instruction cache is not snooped, so if a processor 
modifies a memory location that may be contained in the instruction cache, software must 
ensure that such memory updates are visible to the instruction fetching mechanism. This 
can be achieved with the following instruction sequence: 
































dcbst update memory 

sync wait for update 

icbi remove (invalidate) copy in instruction cache 
sync wait for ICBI operation to be globally performed 
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isyne remove copy in own instruction buffer 








These operations are necessary because the processor does not maintain instruction 
memory coherent with data memory. Software is responsible for enforcing coherency of 
instruction caches and data memory. Since instruction fetching may bypass the data cache, 
changes made to items in the data cache may not be reflected in memory until after the 
instruction fetch completes. 





Block 0 








Block 1 i Words [0-7] 
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Figure 3-3. Instruction Cache Organization 


3.3 Memory and Cache Coherency 


The primary objective of a coherent memory system is to provide the same image of 
memory to all devices using the system. Coherency allows synchronization and cooperative 
use of shared resources. Otherwise, multiple copies of a memory location, some containing 
stale values, could exist in a system resulting in errors when the stale values are used. Each 
potential bus master must follow rules for managing the state of its cache. This section 
describes the coherency mechanisms of the PowerPC architecture and the three-state cache 
coherency protocol of the MPC750 data cache. 


Note that unless specifically noted, the discussion of coherency in this section applies to the 
MPC750’s data cache only. The instruction cache is not snooped. Instruction cache 
coherency must be maintained by software. However, the MPC750 does support a fast 
instruction cache invalidate capability as described in Section 3.4.1.4, “Instruction Cache 
Flash Invalidation.” 
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3.3.1. Memory/Cache Access Attributes (WIMG Bits) 


Some memory characteristics can be set on either a block or page basis by using the WIMG 
bits in the BAT registers or page table entry (PTE), respectively. The WIMG attributes 
control the following functionality: 


e Write-through (W bit) 

¢ Caching-inhibited (I bit) 

¢ Memory coherency (M bit) 
¢ Guarded memory (G bit) 


These bits allow both uniprocessor and multiprocessor system designs to exploit numerous 
system-level performance optimizations. 


The WIMG attributes are programmed by the operating system for each page and block. 
The W and J attributes control how the processor performing an access uses its own cache. 
The M attribute ensures that coherency is maintained for all copies of the addressed 
memory location. The G attribute prevents out-of-order loading and prefetching from the 
addressed memory location. 


The WIMG attributes occupy four bits in the BAT registers for block address translation 
and in the PTEs for page address translation. The WIMG bits are programmed as follows: 


e The operating system uses the mtspr instruction to program the WIMG bits in the 
BAT registers for block address translation. The IBAT register pairs do not have a 
G bit and all accesses that use the IBAT register pairs are considered not guarded. 


¢ The operating system writes the WIMG bits for each page into the PTEs in system 
memory as it sets up the page tables. 


When an access requires coherency, the processor performing the access must inform the 
coherency mechanisms throughout the system that the access requires memory coherency. 
The M attribute determines the kind of access performed on the bus (global or local). 


Software must exercise care with respect to the use of these bits if coherent memory support 
is desired. Careless specification of these bits may create situations that present coherency 
paradoxes to the processor. In particular, this can happen when the state of these bits is 
changed without appropriate precautions (such as flushing the pages that correspond to the 
changed bits from the caches of all processors in the system) or when the address 
translations of aliased real addresses specify different values for any of the WIMG bits. 
These coherency paradoxes can occur within a single processor or across several 
processors. It is important to note that in the presence of a paradox, the operating system 
software is responsible for correctness. 


For real addressing mode (that is, for accesses performed with address translation 
disabled—MSR[IR] = 0 or MSR[DR] = 0 for instruction or data access, respectively), the 
WIMG bits are automatically generated as Ob0011 (the data is write-back, caching is 
enabled, memory coherency is enforced, and memory is guarded). 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Memory and Cache Coherency 


3.3.2 MEI Protocol 


The MPC750 data cache coherency protocol is a coherent subset of the standard MESI 
four-state cache protocol that omits the shared state. The MPC750’s data cache 
characterizes each 32-byte block it contains as being in one of three MEI states. Addresses 
presented to the cache are indexed into the cache directory with bits A[20—26], and the 
upper-order 20 bits from the physical address translation (PA[O—19]) are compared against 
the indexed cache directory tags. If neither of the indexed tags matches, the result is a cache 
miss. If a tag matches, a cache hit occurred and the directory indicates the state of the cache 
block through two state bits kept with the tag. The three possible states for a cache block in 
the cache are the modified state (M), the exclusive state (E), and the invalid state (I). The 
three MEI states are defined in Table 3-1. 


Table 3-1. MEI State Definitions 





MEI State Definition 





Modi ed (M) | The addressed cache block is present in the cache, and is modi ed with respect to 
system memory—that is, the modi ed data in the cache b lock has not been written back 
to memory. The cache block may be present in the MPC750’s L2 cache, but it is not 
present in any other coherent cache. 





Exclusive (E) | The addressed cache block is present in the cache, and this cache has exclusive 
ownership of the addressed block. The addressed block may be present in the 
MPC750’s L2 cache, but it is not present in any other processor's cache. The data in this 
cache block is consistent with system memory. 





Invalid (1) This state indicates that the address block does not contain valid data or that the 
addressed cache block is not resident in the cache. 











The MPC750 provides dedicated hardware to provide memory coherency by snooping bus 
transactions. Figure 3-4 shows the MEI cache coherency protocol, as enforced by the 
MPC750. Figure 3-4 assumes that the WIM bits for the page or block are set to 001; that is, 
write-back, caching-not-inhibited, and memory coherency enforced. 


Because data cannot be shared, the MPC750 signals all cache block fills as if they were 
write misses (read-with-intent-to-modify), which flushes the corresponding copies of the 
data in all caches external to the MPC750 prior to the cache-block-fill operation. Following 
the cache block load, the MPC750 is the exclusive owner of the data and may write to it 
without a bus broadcast transaction. 
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RH Modified ms RH 
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WH SH/CIR 
Bus Transactions 
SH = Snoop Hit = Snoop Push Gd) 


RH = Read Hit 


RM = Read Miss 

WH = Write Hit = Cache Block Fill CG) 
WM = Write Miss 

SH/CRW = Snoop Hit, Cacheable Read/Write 


Figure 3-4. MEI Cache Coherency Protocol—State Diagram (WIM = 001) 


To maintain the three-state coherency, all global reads observed on the bus by the MPC750 
are snooped as if they were writes, causing the MPC750 to flush the cache block (write the 
cache block back to memory and invalidate the cache block if it is modified, or simply 
invalidate the cache block if it is unmodified). The exception to this rule occurs when a 
snooped transaction is a caching-inhibited read (either burst or single-beat, where TT[0-4] 
= X1010; see Table 7-1 for clarification), in which case the MPC750 does not invalidate the 
snooped cache block. If the cache block is modified, the block is written back to memory, 
and the cache block is marked exclusive. If the cache block is marked exclusive, no bus 
action is taken, and the cache block remains in the exclusive state. This treatment of 
caching-inhibited reads decreases the possibility of data thrashing by allowing noncaching 
devices to read data without invalidating the entry from the MPC750’s data cache. 


Section 3.8, “MEI State Transactions,” provides a detailed list of MEI transitions for 
various operations and WIM bit settings. 
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3.3.2.1. MEI Hardware Considerations 


While the MPC750 provides the hardware required to monitor bus traffic for coherency, the 
MPC750 data cache tags are single-ported, and a simultaneous load/store and snoop access 
represents a resource conflict. In general, the snoop access has highest priority and is given 
first access to the tags. The load or store access will then occur on the clock following the 
snoop. The snoop is not given priority into the tags when the snoop coincides with a tag 
write (for example, validation after a cache block load). In these situations, the snoop is 
retried and must re-arbitrate before the lookup is possible. 


Occasionally, cache snoops cannot be serviced and must be retried. These retries occur if 
the cache is busy with a burst read or write when the snoop operation takes place. 


Note that it is possible for a snoop to hit a modified cache block that is already in the process 
of being written to the copy-back buffer for replacement purposes. If this happens, the 
MPC750 retries the snoop, and raises the priority of the castout operation to allow it to go 
to the bus before the cache block fill. 


Another consideration is page table aliasing. If a store hits to a modified cache block but 
the page table entry is marked write-through (WIMG = 1xxx), then the page has probably 
been aliased through another page table entry which is marked write-back (WIMG = Oxxx). 
If this occurs, the MPC750 ignores the modified bit in the cache tag. The cache block is 
updated during the write-through operation and the block remains in the modified state. 


The global (GBL) signal, asserted as part of the address attribute field during a bus 
transaction, enables the snooping hardware of the MPC750. Address bus masters assert 
GBL to indicate that the current transaction is a global access (that is, an access to memory 
shared by more than one device). If GBL is not asserted for the transaction, that transaction 
is not snooped by the MPC750. Note that the GBL signal is not asserted for instruction 
fetches, and that GBL is asserted for all data read or write operations when using real 
addressing mode (that is, address translation is disabled). 














Normally, GBL reflects the M-bit value specified for the memory reference in the 
corresponding translation descriptor(s). Care should be taken to minimize the number of 
pages marked as global, because the retry protocol enforces coherency and can use 
considerable bus bandwidth if much data is shared. Therefore, available bus bandwidth 
decreases as more memory is marked as global. 


The MPC750 snoops a transaction if the transfer start (TS) and GBL signals are asserted 
together in the same bus clock (this is a qualified snooping condition). No snoop update to 
the MPC750 cache occurs if the snooped transaction is not marked global. Also, because 
cache block castouts and snoop pushes do not require snooping, the GBL signal is not 
asserted for these operations. 





When the MPC750 detects a qualified snoop condition, the address associated with the TS 
signal is compared with the cache tags. Snooping finishes if no hit is detected. If, however, 
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the address hits in the cache, the MPC750 reacts according to the MEI protocol shown in 
Figure 3-4. 


3.3.3 Coherency Precautions in Single Processor Systems 


The following coherency paradoxes can be encountered within a single-processor system: 
¢ Load or store to a caching-inhibited page (WIMG = x1 xx) and a cache hit occurs. 


The MPC750 ignores any hits to a cache block in a memory space marked 
caching-inhibited (WIMG = x1xx). The access is performed on the external bus as 
if there were no hit. The data in the cache is not pushed, and the cache block is not 
invalidated. 


e Store to a page marked write-through (WIMG = 1xxx) and a cache hit occurs to a 
modified cache block. 


The MPC750 ignores the modified bit in the cache tag. The cache block is updated 
during the write-through operation but the block remains in the modified state (M). 


Note that when WIM bits are changed in the page tables or BAT registers, it is critical that 
the cache contents reflect the new WIM bit settings. For example, if a block or page that 
had allowed caching becomes caching-inhibited, software should ensure that the 
appropriate cache blocks are flushed to memory and invalidated. 


3.3.4 Coherency Precautions in Multiprocessor Systems 


The MPC750’s three-state coherency protocol permits no data sharing between the 
MPC750 and other caches. All burst reads initiated by the MPC750 are performed as read 
with intent to modify. Burst snoops are interpreted as read with intent to modify or read 
with no intent to cache. This effectively places all caches in the system into a three-state 
coherency scheme. Four-state caches may share data amongst themselves but not with the 
MPC750. 


3.3.5 MPC750-Initiated Load/Store Operations 


Load and store operations are assumed to be weakly ordered on the MPC750. The 
load/store unit (LSU) can perform load operations that occur later in the program ahead of 
store operations, even when the data cache is disabled (see Section 3.3.5.2, “Sequential 
Consistency of Memory Accesses). However, strongly ordered load and store operations 
can be enforced through the setting of the I bit (of the page WIMG bits) when address 
translation is enabled. Note that when address translation is disabled (real addressing 
mode), the default WIMG bits cause the I bit to be cleared (accesses are assumed to be 
cacheable), and thus the accesses are weakly ordered. Refer to Section 5.2, “Real 
Addressing Mode,” for a description of the WIMG bits when address translation is disabled. 
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The MPC750 does not provide support for direct-store segments. Operations attempting to 
access a direct-store segment will invoke a DSI exception. For additional information about 
DSI exceptions, refer to Section 4.5.3, “DSI Exception (0x00300).” 


3.3.5.1 Performed Loads and Stores 


The PowerPC architecture defines a performed load operation as one that has the addressed 
memory location bound to the target register of the load instruction. The architecture 
defines a performed store operation as one where the stored value is the value that any other 
processor will receive when executing a load operation (that is of course, until it is changed 
again). With respect to the MPC750, caching-allowed (WIMG = xOxx) loads and 
caching-allowed, write-back (WIMG = OOxx) stores are performed when they have 
arbitrated to address the cache block. Note that in the event of a cache miss, these storage 
operations may place a memory request into the processor’s memory queue, but such 
operations are considered an extension to the state of the cache with respect to snooping 
bus operations. Caching-inhibited (WIMG = x1xx) loads, caching-inhibited (WIMG = 
x1xx) stores, and write-through (WIMG = 1 xxx) stores are performed when they have been 
successfully presented to the external 60x bus. 


3.3.5.2 Sequential Consistency of Memory Accesses 


The PowerPC architecture requires that all memory operations executed by a single 
processor be sequentially consistent with respect to that processor. This means that all 
memory accesses appear to be executed in program order with respect to exceptions and 
data dependencies. 


The MPC750 achieves sequential consistency by operating a single pipeline to the 
cache/MMU. All memory accesses are presented to the MMU in exact program order and 
therefore exceptions are determined in order. Loads are allowed to bypass stores once 
exception checking has been performed for the store, but data dependency checking is 
handled in the load/store unit so that a load will not bypass a store with an address match. 
Note that although memory accesses that miss in the cache are forwarded to the memory 
queue for future arbitration for the external bus, all potential synchronous exceptions have 
been resolved before the cache. In addition, although subsequent memory accesses can 
address the cache, full coherency checking between the cache and the memory queue is 
provided to avoid dependency conflicts. 


3.3.5.3 Atomic Memory References 


The PowerPC architecture defines the Load Word and Reserve Indexed (Iwarx) and the 
Store Word Conditional Indexed (stwex.) instructions to provide an atomic update function 
for a single, aligned word of memory. These instructions can be used to develop a rich set 
of multiprocessor synchronization primitives. Note that atomic memory references 
constructed using lwarx/stwex. instructions depend on the presence of a coherent memory 
system for correct operation. These instructions should not be expected to provide atomic 
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access to noncoherent memory. For detailed information on these instructions, refer to 
Chapter 2, “Programming Model,” in this book and Chapter 8, “Instruction Set,’ in the 
Programming Environments Manual. 


The Iwarx instruction performs a load word from memory operation and creates a 
reservation for the 32-byte section of memory that contains the accessed word. The 
reservation granularity is 32 bytes. The lwarx instruction makes a nonspecific reservation 
with respect to the executing processor and a specific reservation with respect to other 
masters. This means that any subsequent stwex. executed by the same processor, regardless 
of address, will cancel the reservation. Also, any bus write or invalidate operation from 
another processor to an address that matches the reservation address will cancel the 
reservation. 


The stwex. instruction does not check the reservation for a matching address. The stwex. 
instruction is only required to determine whether a reservation exists. The stwex. 
instruction performs a store word operation only if the reservation exists. If the reservation 
has been cancelled for any reason, then the stwex. instruction fails and clears the CRO[EQ] 
bit in the condition register. The architectural intent is to follow the lwarx/stwex. 
instruction pair with a conditional branch which checks to see whether the stwex. 
instruction failed. 


If the page table entry is marked caching-allowed (WIMG = xOxx), and an Iwarx access 
misses in the cache, then the MPC750 performs a cache block fill. If the page is marked 
caching-inhibited (WIMG = x1xx) or the cache is locked, and the access misses, then the 
lwarx instruction appears on the bus as a single-beat load. All bus operations that are a 
direct result of either an lwarx instruction or an stwex. instruction are placed on the bus 
with a special encoding. Note that this does not force all lwarx instructions to generate bus 
transactions, but rather provides a means for identifying when an Iwarx instruction does 
generate a bus transaction. If an implementation requires that all lwarx instructions 
generate bus transactions, then the associated pages should be marked as caching-inhibited. 


The state of the reservation is always presented onto the RSRV output signal. This can be 
used to determine when an internal condition has caused a change in the reservation state. 


The MPC750’s data cache treats all stwex. operations as write-through independent of the 
WIMG settings. However, if the stwex. operation hits in the MPC750’s L2 cache, then the 
operation completes with the reservation intact in the L2 cache. See Chapter 9, “L2 Cache 
Interface Operation,” for more information. Otherwise, the stwex. operation continues to 
the bus interface unit for completion. When the write-through operation completes 
successfully, either in the L2 cache or on the 60x bus, then the data cache entry is updated 
(assuming it hits), and CRO[EQ] is modified to reflect the success of the operation. If the 
reservation is not intact, the stwex. completes in the bus interface unit without performing 
a bus transaction, and without modifying either of the caches. 
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3.4 Cache Control 


The MPC750’s L1 caches are controlled by programming specific bits in the HIDO 
special-purpose register and by issuing dedicated cache control instructions. Section 3.4.1, 
“Cache Control Parameters in HIDO,’ describes the HIDO cache control bits, and 
Section 3.4.2, “Cache Control Instructions,” describes the cache control instructions. 


3.4.1. Cache Control Parameters in HIDO 


The HIDO special-purpose register contains several bits that invalidate, disable, and lock 
the instruction and data caches. The following sections describe these facilities. 


3.4.1.1. Data Cache Flash Invalidation 


The data cache is automatically invalidated when the MPC750 is powered up and during a 
hard reset. However, a soft reset does not automatically invalidate the data cache. Software 
must use the HIDO data cache flash invalidate bit (HIDO[DCFI]) if data cache invalidation 
is desired after a soft reset. Once HIDO[DCFI] is set through an mtspr operation, the 
MPC750 automatically clears this bit in the next clock cycle (provided that the data cache 
is enabled in the HIDO register). 


Note that some microprocessors that implement the PowerPC architecture, accomplish data 
cache flash invalidation by setting and clearing HIDO[DCFI] with two consecutive mtspr 
instructions (that is, the bit is not automatically cleared by the microprocessor). Software 
that has this sequence of operations does not need to be changed to run on the MPC750. 


3.4.1.2 Data Cache Enabling/Disabling 


The data cache may be enabled or disabled by using the data cache enable bit, HIDO[DCE]. 
HIDO[DCE] is cleared on power-up, disabling the data cache. 


When the data cache is in the disabled state (HIDO[DCE] = 0), the cache tag state bits are 
ignored, and all accesses are propagated to the L2 cache or 60x bus as single-beat 
transactions. Note that the CI (cache inhibit) signal always reflects the state of the 
caching-inhibited memory/cache access attribute (the I bit) independent of the state of 
HIDO[DCE]. Also note that disabling the data cache does not affect the translation logic; 
translation for data accesses is controlled by MSR[DR]. 


The setting of the DCE bit must be preceded by a sync instruction to prevent the cache from 
being enabled or disabled in the middle of a data access. In addition, the cache must be 
globally flushed before it is disabled to prevent coherency problems when it is re-enabled. 


Snooping is not performed when the data cache is disabled. 


The debz instruction will cause an alignment exception when the data cache is disabled. 
The touch load (debt and dcebtst) instructions are no-ops when the data cache is disabled. 
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Other cache operations (caused by the debf, dcbst, and debi instructions) are not affected 
by disabling the cache. This can potentially cause coherency errors. For example, a dcbf 
instruction that hits a modified cache block in the disabled cache will cause a copyback to 
memory of potentially stale data. 


3.4.1.3. Data Cache Locking 


The contents of the data cache can be locked by setting the data cache lock bit, 
HIDO[DLOCK]. A data access that hits in a locked data cache is serviced by the cache. 
However, all accesses that miss in the locked cache are propagated to the L2 cache or 60x 
bus as single-beat transactions. Note that the CI signal always reflects the state of the 
caching-inhibited memory/cache access attribute (the I bit) independent of the state of 
HIDO[DLOCK]. 


The MPC750 treats snoop hits to a locked data cache the same as snoop hits to an unlocked 
data cache. However, any cache block invalidated by a snoop hit remains invalid until the 
cache is unlocked. 


The setting of the DLOCK bit must be preceded by a sync instruction to prevent the data 
cache from being locked during a data access. 


3.4.1.4 Instruction Cache Flash Invalidation 


The instruction cache is automatically invalidated when the MPC750 is powered up and 
during a hard reset. However, a soft reset does not automatically invalidate the instruction 
cache. Software must use the HIDO instruction cache flash invalidate bit (HIDO[ICFI]) if 
instruction cache invalidation is desired after a soft reset. Once HIDO[ICFI] is set through 
an mtspr operation, the MPC750 automatically clears this bit in the next clock cycle 
(provided that the instruction cache is enabled in the HIDO register). 


Note that some microprocessors that implement the PowerPC architecture, accomplish 
instruction cache flash invalidation by setting and clearing HIDO[ICFI] with two 
consecutive mtspr instructions (that is, the bit is not automatically cleared by the 
microprocessor). Software that has this sequence of operations does not need to be changed 
to run on the MPC750. 


3.4.1.5 Instruction Cache Enabling/Disabling 


The instruction cache may be enabled or disabled through the use of the instruction cache 
enable bit, HIDO[ICE]. HIDO[ICE] is cleared on power-up, disabling the instruction cache. 


When the instruction cache is in the disabled state (HID[ICE] = 0), the cache tag state bits 
are ignored, and all instruction fetches are propagated to the L2 cache or 60x bus as 
single-beat transactions. Note that the CI signal always reflects the state of the 
caching-inhibited memory/cache access attribute (the I bit) independent of the state of 
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HIDO[ICE]. Also note that disabling the instruction cache does not affect the translation 
logic; translation for instruction accesses is controlled by MSR[IR]. 


The setting of the ICE bit must be preceded by an isync instruction to prevent the cache 
from being enabled or disabled in the middle of an instruction fetch. In addition, the cache 
must be globally flushed before it is disabled to prevent coherency problems when it is 
re-enabled. The icbi instruction is not affected by disabling the instruction cache. 


3.4.1.6 Instruction Cache Locking 


The contents of the instruction cache can be locked by setting the instruction cache lock bit, 
HIDO[ILOCK]. An instruction fetch that hits in a locked instruction cache is serviced by 
the cache. However, all accesses that miss in the locked cache are propagated to the L2 
cache or 60x bus as single-beat transactions. Note that the CI signal always reflects the state 
of the caching-inhibited memory/cache access attribute (the I bit) independent of the state 
of HIDO[ILOCK]. 


The setting of the ILOCK bit must be preceded by an isync instruction to prevent the 
instruction cache from being locked during an instruction fetch. 


3.4.2 Cache Control Instructions 


The PowerPC architecture defines instructions for controlling both the instruction and data 
caches (when they exist). The cache control instructions, debt, dcbtst, dcbz, dcbst, dcbf, 
debi, and icbi, are intended for the management of the local L1 and L2 caches. The 
MPC750 interprets the cache control instructions as if they pertain only to its own L1 or L2 
caches. These instructions are not intended for managing other caches in the system (except 
to the extent necessary to maintain coherency). 


The MPC750 does not snoop cache control instruction broadcasts, except for debz when 
M = 1. The debz instruction is the only cache control instruction that causes a broadcast on 
the 60x bus (when M = 1) to maintain coherency. All other data cache control instructions 
(debi, dcbf, dcbst and dcbz) are not broadcast, unless broadcast is enabled through the 
HIDO[ABE] configuration bit. Note that debi, dcbf, dcbst and dcbz do broadcast to the 
MPC750’s L2 cache, regardless of HIDO[ABE]. The icbi instruction is never broadcast. 


3.4.2.1. Data Cache Block Touch (dcbt) and 
Data Cache Block Touch for Store (dcbtst) 


The Data Cache Block Touch (debt) and Data Cache Block Touch for Store (debtst) 
instructions provide potential system performance improvement through the use of 
software-initiated prefetch hints. The MPC750 treats these instructions identically (that is, 
a dcbtst instruction behaves exactly the same as a debt instruction on the MPC750). Note 
that processor implementations are not required to take any action based on the execution 
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of these instructions, but they may choose to prefetch the cache block corresponding to the 
effective address into their cache. 


The MPC750 loads the data into the cache when the address hits in the TLB or the BAT, is 
permitted load access from the addressed page, is not directed to a direct-store segment, and 
is directed at a cacheable page. Otherwise, the MPC750 treats these instructions as no-ops. 
The data brought into the cache as a result of this instruction is validated in the same manner 
that a load instruction would be (that is, it is marked as exclusive). The memory reference 
of a debt (or dcebtst) instruction causes the reference bit to be set. Note also that the 
successful execution of the debt (or debtst) instruction affects the state of the TLB and 
cache LRU bits as defined by the PLRU algorithm. 


3.4.2.2 Data Cache Block Zero (dcbz) 


The effective address is computed, translated, and checked for protection violations as 
defined in the PowerPC architecture. The dcbz instruction is treated as a store to the 
addressed byte with respect to address translation and protection. 


If the block containing the byte addressed by the EA is in the data cache, all bytes are 
cleared, and the tag is marked as modified (M). If the block containing the byte addressed 
by the EA is not in the data cache and the corresponding page is caching-allowed, the block 
is established in the data cache without fetching the block from main memory, and all bytes 
of the block are cleared, and the tag is marked as modified (M). 


If the contents of the cache block are from a page marked memory coherence required 
(M = 1), an address-only bus transaction is run prior to clearing the cache block. The debz 
instruction is the only cache control instruction that causes a broadcast on the 60x bus 
(when M = 1) to maintain coherency. The other cache control instructions are not broadcast 
unless broadcasting is specifically enabled through the HIDO[ABE] configuration bit. 


The debz instruction executes regardless of whether the cache is locked, but if the cache is 
disabled, an alignment exception is generated. If the page containing the byte addressed by 
the EA is caching-inhibited or write-through, then the system alignment exception handler 
is invoked. BAT and TLB protection violations generate DSI exceptions. 


Both the MPC750 and MPC755 processors require protection in the use of the dcbz 
instruction in order to guarantee cache coherency in a multiprocessor system. Specifically, 
the dcebz instruction must be: 


e Either enveloped by high-level software synchronization protocols (such as 
semaphores), or 
e Preceded by execution of a debf instruction to the same address. 
One of these precautions must be taken in order to guarantee that there are no simultaneous 


cache hits from a debz instruction and a snoop to that address. If these two events occur 
simultaneously, stale data may occur, causing system failures. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Cache Control 


3.4.2.3. Data Cache Block Store (dcbst) 


The effective address is computed, translated, and checked for protection violations as 
defined in the PowerPC architecture. This instruction is treated as a load with respect to 
address translation and memory protection. 


If the address hits in the cache and the cache block is in the exclusive (E) state, no action is 
taken. If the address hits in the cache and the cache block is in the modified (M) state, the 
modified block is written back to memory and the cache block is placed in the exclusive (E) 
state. 


The execution of a debst instruction does not broadcast on the 60x bus unless broadcast is 
enabled through the HIDO[ABE] bit. The function of this instruction is independent of the 
WIMG bit settings of the block containing the effective address. The debst instruction 
executes regardless of whether the cache is disabled or locked; however, a BAT or TLB 
protection violation generates a DSI exception. 


3.4.2.4 Data Cache Block Flush (dcbf) 


The effective address is computed, translated, and checked for protection violations as 
defined in the PowerPC architecture. This instruction is treated as a load with respect to 
address translation and memory protection. 


If the address hits in the cache, and the block is in the modified (M) state, the modified block 
is written back to memory and the cache block is placed in the invalid (I) state. If the address 
hits in the cache, and the cache block is in the exclusive (E) state, the cache block is placed 
in the invalid (1) state. If the address misses in the cache, no action is taken. 


The execution of dcbf does not broadcast on the 60x bus unless broadcast is enabled 
through the HIDO[ABE] bit. The function of this instruction is independent of the WIMG 
bit settings of the block containing the effective address. The debf instruction executes 
regardless of whether the cache is disabled or locked; however, a BAT or TLB protection 
violation generates a DSI exception. 


3.4.2.5 Data Cache Block Invalidate (dcbi) 


The effective address is computed, translated, and checked for protection violations as 
defined in the PowerPC architecture. This instruction is treated as a store with respect to 
address translation and memory protection. 


If the address hits in the cache, the cache block is placed in the invalid (I) state, regardless 
of whether the data is modified. Because this instruction may effectively destroy modified 
data, it is privileged (that is, dcbi is available to programs at the supervisor privilege level, 
MSR[PR] = 0). 


The execution of debi does not broadcast on the 60x bus unless broadcast is enabled 
through the HIDO[ABE] bit. The function of this instruction is independent of the WIMG 
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bit settings of the block containing the effective address. The debi instruction executes 
regardless of whether the cache is disabled or locked; however, a BAT or TLB protection 
violation generates a DSI exception. 


3.4.2.6 Instruction Cache Block Invalidate (icbi) 


For the icbi instruction, the effective address is not computed or translated, so it cannot 
generate a protection violation or exception. This instruction performs a virtual lookup into 
the instruction cache (index only). All ways of the selected instruction cache set are 
invalidated. 


The icbi instruction is not broadcast on the 60x bus. The icbi instruction invalidates the 
cache blocks independent of whether the cache is disabled or locked. 


3.5 Cache Operations 


This section describes the MPC750 cache operations. 


3.5.1 Cache Block Replacement/Castout Operations 


Both the instruction and data cache use a pseudo least-recently-used (PLRU) replacement 
algorithm when a new block needs to be placed in the cache. When the data to be replaced 
is in the modified (M) state, that data is written into a castout buffer while the missed data 
is being accessed on the bus. When the load completes, the MPC750 then pushes the 
replaced cache block from the castout buffer to the L2 cache (if L2 is enabled) or to main 
memory (if L2 is disabled). 


The replacement logic first checks to see if there are any invalid blocks in the set and 
chooses the lowest-order, invalid block (L[O-7]) as the replacement target. If all eight 
blocks in the set are valid, the PLRU algorithm is used to determine which block should be 
replaced. The PLRU algorithm is shown in Figure 3-5. 
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Each cache is organized as eight blocks per set by 128 sets. There is a valid bit for each 
block in the cache, L[O—7]. When all eight blocks in the set are valid, the PLRU algorithm 
is used to select the replacement target. There are seven PLRU bits, B[O-—6] for each set in 
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the cache. For every hit in the cache, the PLRU bits are updated using the rules specified 
inTable 3-2. 


Table 3-2. PLRU Bit Update Rules 
































If the Then the PLRU bits are Changed to: 
Current 

nee 2 BO Bt B2 B3 B4 B5 BG 
LO 1 q Xx 1 X Xx Xx 
L1 1 1 X 0 Xx Xx Xx 
L2 1 0 Xx X 1 Xx Xx 
L3 1 0 Xx X 0 X Xx 
L4 0 Xx 1 Xx Xx 1 Xx 
L5 0 X 1 X Xx 0 Xx 
L6 0 X 0 X Xx X 1 
L7 0 Xx 0 Xx X X 0 
































x = Does not change 


If all eight blocks are valid, then a block is selected for replacement according to the PLRU 
bit encodings shown in Table 3-3. 


Table 3-3. PLRU Replacement Block Selection 
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During power-up or hard reset, all the valid bits of the blocks are cleared and the PLRU bits 
cleared to point to block LO of each set. Note that this is also the state of the data or 
instruction cache after setting their respective flash invalidate bit (HIDO[DCFI] or 
HIDO[ICFI)). 
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3.5.2 Cache Flush Operations 


The instruction cache can be invalidated by executing a series of icbi instructions or by 
setting HIDO[ICFI]. The data cache can be invalidated by executing a series of debi 
instructions or by setting HIDO[DCFI]. 


Any modified entries in the data cache can be copied back to memory (flushed) by using 
the debf instruction or by executing a series of 12 uniquely addressed load or debz 
instructions to each of the 128 sets. The address space should not be shared with any other 
process to prevent snoop hit invalidations during the flushing routine. Exceptions should be 
disabled during this time so that the PLRU algorithm does not get disturbed. 


The data cache flush assist bit, HIDO[DCFA], simplifies the software flushing process. 
When set, HIDO[DCFA] forces the PLRU replacement algorithm to ignore the invalid 
entries and follow the replacement sequence defined by the PLRU bits. This reduces the 
series of uniquely addressed load or debz instructions to eight per set. HIDO[DCFA] should 
be set just prior to the beginning of the cache flush routine and cleared after the series of 
instructions is complete. 


3.5.3. Data Cache-Block-Fill Operations 


The MPC750’s data cache blocks are filled in four beats of 64 bits each, with the critical 
double word loaded first. The data cache is not blocked to internal accesses while the load 
(caused by a cache miss) completes. This functionality is sometimes referred to as ‘hits 
under misses,’ because the cache can service a hit while a cache miss fill is waiting to 
complete. The critical-double-word read from memory is simultaneously written to the data 
cache and forwarded to the requesting unit, thus minimizing stalls due to cache fill latency. 


A cache block is filled after a read miss or write miss (read-with-intent-to-modify) occurs 
in the cache. The cache block that corresponds to the missed address is updated by a burst 
transfer of the data from the L2 or system memory. Note that if a read miss occurs in a 
system with multiple bus masters, and the data is modified in another cache, the modified 
data is first written to external memory before the cache fill occurs. 


3.5.4 Instruction Cache-Block-Fill Operations 


The MPC750’s instruction cache blocks are loaded in four beats of 64 bits each, with the 
critical double word loaded first. The instruction cache is not blocked to internal accesses 
while the fetch (caused by a cache miss) completes. On a cache miss, the critical and 
following double words read from memory are simultaneously written to the instruction 
cache and forwarded to the instruction queue, thus minimizing stalls due to cache fill 
latency. There is no snooping of the instruction cache. 
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3.5.5 Data Cache-Block-Push Operation 


When a cache block in the MPC750 is snooped and hit by another bus master and the data 
is modified, the cache block must be written to memory and made available to the snooping 
device. The cache block that is hit is said to be pushed out onto the 60x bus. The MPC750 
supports two kinds of push operations—normal push operations and enveloped 
high-priority push operations, which are described in Section 3.5.5.1, “Enveloped 
High-Priority Cache-Block-Push Operation.” 


3.5.5.1. Enveloped High-Priority Cache-Block-Push Operation 


In cases where the MPC750 has completed the address tenure of a read operation, and then 
detects a snoop hit to a modified cache block by another bus master, the MPC750 provides 
a high-priority push operation. If the address snooped is the same as the address of the data 
to be returned by the read operation, ARTRY is asserted one or more times until the data 
tenure of the read operation is completed. The cache-block-push transaction can be 
enveloped within the address and data tenures of a read operation. This feature prevents 
deadlocks in system organizations that support multiple memory-mapped buses. 





More specifically, the MPC750 internally detects the scenario where a load request is 
outstanding and the processor has pipelined a write operation on top of the load. Normally, 
when the data bus is granted to the MPC750, the resulting data bus tenure is used for the 
load operation. The enveloped high-priority cache block push feature defines a bus signal, 
data bus write only (DBWO), which when asserted with a qualified data bus grant indicates 
that the resulting data tenure should be used for the store operation instead. This signal is 
described in Section 8.10, “Using Data Bus Write Only.” Note that the enveloped 
copy-back operation is an internally pipelined bus operation. 


3.6 L1 Caches and 60x Bus Transactions 


The MPC750 transfers data to and from the cache in single-beat transactions of two words, 
or in four-beat transactions of eight words which fill a cache block. Single-beat bus 
transactions can transfer from one to eight bytes to or from the MPC750, and can be 
misaligned. Single-beat transactions can be caused by cache write-through accesses, 
caching-inhibited accesses (WIMG = x1xx), accesses when the cache is disabled 
(HIDO[DCE] bit is cleared), or accesses when the cache is locked (HIDO[DLOCK] bit is 
cleared). 


Burst transactions on the MPC750 always transfer eight words of data at a time, and are 
aligned to a double-word boundary. The MPC750 transfer burst (TBST) output signal 
indicates to the system whether the current transaction is a single-beat transaction or 
four-beat burst transfer. Burst transactions have an assumed address order. For cacheable 
read operations, instruction fetches, or cacheable, non-write-through write operations that 
miss the cache, the MPC750 presents the double-word-aligned address associated with the 
load/store instruction or instruction fetch that initiated the transaction. 
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As shown in Figure 3-6, the first quad word contains the address of the load/store or 
instruction fetch that missed the cache. This minimizes latency by allowing the critical code 
or data to be forwarded to the processor before the rest of the block is filled. For all other 
burst operations, however, the entire block is transferred in order (oct-word-aligned). 
Critical-double-word-first fetching on a cache miss applies to both the data and instruction 
cache. 


MPC750 Cache Address 
Bits (27... 28) 


00 01 10 


: 


If the address requested is in double-word A, the address placed on the bus is that of double-word A, and 
the four data beats are ordered in the following manner: 


Beat 
1 


2 3 
a ee ee ee ee eee 
If the address requested is in double-word C, the address placed on the bus will be that of double-word 
C, and the four data beats are ordered in the following manner: 


} 


Beat 
1 


2 3 
ee Oe 


Figure 3-6. Double-Word Address Ordering—Critical Double Word First 


i 


3.6.1 Read Operations and the MEI Protocol 


The MEI coherency protocol affects how the MPC750 data cache performs read operations 
on the 60x bus. All reads (except for caching-inhibited reads) are encoded on the bus as 
read-with-intent-to-modify (RWITM) to force flushing of the addressed cache block from 
other caches in the system. 


The MEI coherency protocol also affects how the MPC750 snoops read operations on the 
60x bus. All reads snooped from the 60x bus (except for caching-inhibited reads) are 
interpreted as RWITM to cause flushing from the MPC750’s cache. Single-beat reads 
(TBST negated) are interpreted by the MPC750 as caching inhibited. 





These actions for read operations allow the MPC750 to operate successfully (coherently) 
on the bus with other bus masters that implement either the three-state MEI or a four-state 
MESI cache coherency protocol. 


3.6.2 Bus Operations Caused by Cache Control Instructions 


The cache control, TLB management, and synchronization instructions supported by the 
MPC750 may affect or be affected by the operation of the 60x bus. The operation of the 
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instructions may also indirectly cause bus transactions to be performed, or their completion 
may be linked to the bus. 


The debz instruction is the only cache control instruction that causes an address-only 
broadcast on the 60x bus. All other data cache control instructions (debi, dcbf, dcbst, and 
dcbz) are not broadcast unless specifically enabled through the HIDO[ABE] configuration 
bit. Note that debi, debf, dcbst, and dcbz do broadcast to the MPC750’s L2 cache, 
regardless of HIDO[ABE]. HIDO[ABE] also controls the broadcast of the sync and eieio 
instructions. The icbi instruction is never broadcast. No broadcasts by other masters are 
snooped by the MPC750 (except for debz kill block transactions). For detailed information 
on the cache control instructions, refer to Chapter 2, “Programming Model,” in this book 
and Chapter 8, “Instruction Set,” in the Programming Environments Manual. 


Table 3-4 provides an overview of the bus operations initiated by cache control instructions. 
Note that Table 3-4 assumes that the WIM bits are set to 001; that is, the cache is operating 
in write-back mode, caching is permitted and coherency is enforced. 


Table 3-4. Bus Operations Caused by Cache Control Instructions (WIM = 001) 















































Instruction aires mene Next Cache State Bus Operation Comment 
sync Don’t care No change sync Waits for memory queues to 
(if enabled in complete bus activity 
HIDO[ABE]) 
tlbie — _— None — 
tlbsync — — None Waits for the negation of the 
TLBSYNC input signal to 
complete 
eieio Don’t care No change eieio Address-only bus operation 
(if enabled in 
HIDO[ABE]) 
icbi Don’t care I None —_ 
dcbi Don’t care I Kill block Address-only bus operation 
(if enabled in 
HIDO[ABE]) 
dcbf LE I Flush block Address-only bus operation 
(if enabled in 
HIDO[ABE]) 
dcbf M I Write with kill Block is pushed 
dcbst LE No change Clean block Address-only bus operation 
(if enabled in 
HIDO[ABE]) 
dcbst M Write with kill Block is pushed 
dcbz I M Write with kill _— 
dcbz E,M Kill block Writes over modi ed data 
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Table 3-4. Bus Operations Caused by Cache Control Instructions (WIM = 001) 














Instruction reali acne Next Cache State Bus Operation Comment 
dcbt I E Read-with-intent-to- | Fetched cache block is 
modify stored in the cache 
dcbt E,M No change None — 
dcbtst I E Read-with-intent-to- | Fetched cache block is 
modify stored in the cache 
dcbtst E,M No change None — 























For additional details about the specific bus operations performed by the MPC750, see 
Chapter 8, “System Interface Operation.” 


3.6.3. Snooping 


The MPC750 maintains data cache coherency in hardware by coordinating activity between 
the data cache, the bus interface logic, the L2 cache, and the memory system. The MPC750 
has a copy-back cache which relies on bus snooping to maintain cache coherency with other 
caches in the system. For the MPC750, the coherency size of the bus is the size of a cache 
block, 32 bytes. This means that any bus transactions that cross an aligned 32-byte 
boundary must present a new address onto the bus at that boundary for proper snoop 
operation by the MPC750, or they must operate noncoherently with respect to the MPC750. 


As bus operations are performed on the bus by other bus masters, the MPC750 bus 
snooping logic monitors the addresses and transfer attributes that are referenced. The 
MPC750 snoops the bus transactions during the cycle that TS is asserted for any of the 
following qualified snoop conditions: 


¢ The global signal (GBL) is asserted indicating that coherency enforcement is 
required. 


e A reservation is currently active in the MPC750 as the result of an lwarx instruction, 
and the transfer type attributes (TT[O—4]) indicate a write or kill operation. These 
transactions are snooped regardless of whether GBL is asserted to support 
reservations in the MEI cache protocol. 





The state of ABB is not sampled to determine a qualified snoop condition. All transactions 
snooped by the MPC750 are checked for correct address bus parity. Every assertion of TS 
detected by the MPC750 (whether snooped or not) must be followed by an accompanying 
assertion of AACK. 


Once a qualified snoop condition is detected on the bus, the snooped address associated 
with TS is compared against the data cache tags, memory queues, and/or other storage 
elements as appropriate. The L1 data cache tags and L2 cache tags are snooped for standard 
data cache coherency support. No snooping is done in the instruction cache for coherency. 
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The memory queues are snooped for pipeline collisions and memory coherency collisions. 
A pipeline collision is detected when another bus master addresses any portion of a line that 
this MPC750’s data cache is currently in the process of loading (L1 loading from L2, or 
L1/L2 loading from memory). A memory coherency collision occurs when another bus 
master addresses any portion of a line that the MPC750 has currently queued to write to 
memory from the data cache (castout or copy-back), but has not yet been granted bus access 
to perform. 


If a snooped transaction results in a cache hit or pipeline collision or memory queue 
collision, the MPC750 asserts ARTRY on the 60x bus. The current bus master, detecting 
the assertion of the ARTRY signal, should abort the transaction and retry it at a later time, 
so that the MPC750 can first perform a write operation back to memory from its cache or 
memory queues. The MPC750 may also retry a bus transaction if it is unable to snoop the 
transaction on that cycle due to internal resource conflicts. Additional snoop action may be 
forwarded to the cache as a result of a snoop hit in some cases (a cache push of modified 
data, or a cache block invalidation). 


3.6.4 Snoop Response to 60x Bus Transactions 


There are several bus transaction types defined for the 60x bus. The transactions in 
Table 3-5 correspond to the transfer type signals TT[O-4], which are described in 
Section 7.2.4.1, “Transfer Type (TT[0-4]).” 


Table 3-5. Response to Snooped Bus Transactions 

















Snooped Transaction TT[0-4] MPC750 Response 
Clean block 00000 No action is taken. 
Flush block 00100 No action is taken. 
SYNC 01000 No action is taken. 
Kill block 01100 The kill block operation is an address-only bus transaction initiated when 


a debz or debi instruction is executed 

¢ If the addressed cache block is in the exclusive (E) state, the cache 
block is placed in the invalid (I) state. 

* Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed b lock out of the 
cache and the cache block is placed in the invalid (1) state. 

¢ If the address misses in the cache, no action is taken. 

Any reservation associated with the address is canceled. 














EIEIO 10000 No action is taken. 

External control word 10100 No action is taken. 
write 

TLB invalidate 11000 No action is taken. 

External control word 11100 No action is taken. 
read 

lwarx reservation set 00001 No action is taken. 
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Table 3-5. Response to Snooped Bus Transactions (continued) 

















Snooped Transaction TT[0-4] MPC750 Response 
Reserved 00101 — 
TLBSYNC 01001 No action is taken. 
ICBI 01101 No action is taken. 
Reserved 1xx01 — 
Write-with- ush 00010 A write-with- ush oper ation is a single-beat or burst transaction initiated 


when a caching-inhibited or write-through store instruction is executed. 
¢ If the addressed cache block is in the exclusive (E) state, the cache 
block is placed in the invalid (I) state. 

* Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed block out of the 
cache and the cache block is placed in the invalid (I) state. 

¢ If the address misses in the cache, no action is taken. 

Any reservation associated with the address is canceled. 








Write-with-kill 00110 Awrite-with-kill operation is a burst transaction initiated due to a castout, 
caching-allowed push, or snoop copy -back. 

¢ Ifthe address hits in the cache, the cache block is placed in the invalid 
(I) state (killing modi ed data that ma y have been in the block). 

¢ If the address misses in the cache, no action is taken. 

Any reservation associated with the address is canceled. 








Read 01010 A read operation is used by most single-beat and burst load transactions 
on the bus. 

For single-beat, caching-inhibited read transaction: 

¢ If the addressed cache block is in the exclusive (E) state, the cache 
block remains in the exclusive (E) state. 

¢ Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed b lock out of the 
cache and the cache block is placed in the exclusive (E) state. 

¢ If the address misses in the cache, no action is taken. 

For burst read transactions: 

¢ If the addressed cache block is in the exclusive (E) state, the cache 
block is placed in the invalid (I) state. 

* Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed b lock out of the 
cache and the cache block is placed in the invalid (I) state. 

¢ If the address misses in the cache, no action is taken. 


Read-with-intent- 01110 A RWITM operation is issued to acquire exclusive use of a memory 
to-modify (RWITM) location for the purpose of modifying it. 
¢ If the addressed cache block is in the exclusive (E) state, the cache 
block is placed in the invalid (I) state. 
¢ Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed b lock out of the 
cache and the cache block is placed in the invalid (I) state. 
¢ If the address misses in the cache, no action is taken. 
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Table 3-5. Response to Snooped Bus Transactions (continued) 
































Snooped Transaction TT[O—4] MPC750 Response 

Write-with- ush-atomic 10010 Write-with- ush-atomic oper ations occur after the processor issues an 
stwcx. instruction. 
¢ If the addressed cache block is in the exclusive (E) state, the cache 
block is placed in the invalid (I) state. 
¢ Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed b lock out of the 
cache and the cache block is placed in the invalid (I) state. 
¢ If the address misses in the cache, no action is taken. 
Any reservation is canceled, regardless of the address. 

Reserved 10110 — 

Read-atomic 11010 Read atomic operations appear on the bus in response to lwarx 
instructions and generate the same snooping responses as read 
operations. 

Read-with-intent- 11110 The RWITM atomic operations appear on the bus in response to 
to-modify-atomic stwex. instructions and generate the same snooping responses as 
RWITM operations. 
Reserved 00011 — 
Reserved 00111 — 
Read-with-no-intent- 01011 A RWNITC operation is issued to acquire exclusive use of a memory 
to-cache (RWNITC) location with no intention of modifying the location. 
¢ If the addressed cache block is in the exclusive (E) state, the cache 
block remains in the exclusive (E) state. 
¢ Ifthe addressed cache block is in the modi ed (M) state , the MPC750 
asserts ARTRY and initiates a push of the modi ed b lock out of the 
cache and the cache block is placed in the exclusive (E) state. 
¢ If the address misses in the cache, no action is taken. 
Reserved 01111 — 
Reserved 1xx11 — 

















3.6.5 Transfer Attributes 


In addition to the address and transfer type signals, the MPC750 supports the transfer 
attribute signals TBST, TSIZ[0—2], WT, CI, and GBL. The TBST and TSIZ[0-2] signals 
indicate the data transfer size for the bus transaction. 


The WT signal reflects the write-through status (the complement of the W bit) for the 
transaction as determined by the MMU address translation during write operations. WT is 
asserted for burst writes due to debf (flush) and debst (clean) instructions, and for snoop 
pushes; WT is negated for ecowx transactions. Since the write-through status is not 
meaningful for reads, the MPC750 uses the WT signal during read transactions to indicate 
that the transaction is an instruction fetch (WT negated), or not an instruction fetch (WT 
asserted). 


The CI signal reflects the caching-inhibited/allowed status (the complement of the I bit) of 
the transaction as determined by the MMU address translation even if the L1 caches are 
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disabled or locked. CI is always asserted for eciwx/ecowx bus transactions independent of 


the address translation. 


The GBL signal reflects the memory coherency requirements (the complement of the M bit) 
of the transaction as determined by the MMU address translation. Castout and snoop 
copy-back operations (TT[0O-4] = 00110) are generally marked as nonglobal (GBL 
negated) and are not snooped (except for reservation monitoring). Other masters, however, 
may perform DMA write operations with this encoding but marked global (GBL asserted) 


and thus must be snooped. 


Table 3-6 summarizes the address and transfer attribute information presented on the bus 
by the MPC750 for various master or snoop-related transactions. 


Table 3-6. Address/Transfer Attribute Summary 




























































































Bus Transaction A[0-31] TT[O-4] | TBST | TSIZ[O-2] | GBL | WT Cl 
Instruction fetch operations: 
Burst (caching-allowed) PA[0—28] || 0b000 01110 0 010 "M 1 1* 
Single-beat read (caching-inhibited ||PA[O—28] || Ob000 01010 1 000 —M 1 7 
or cache disabled) 
Data cache operations: 
Cache block Il (due to load or store ||PA[0—28] || Ob000 A1110 0 010 —M 0 1* 
miss) 
Castout CA[0-26] || Ob00000 | 00110 0 010 1 1 1* 
(normal replacement) 
Push (cache block push due to PA[0—26] || Ob00000 | 00110 0 010 1 0 1* 
dcbf/dcbst) 
Snoop copyback CA[0-26] || Ob00000 | 00110 0 010 1 0 1* 
Data cache bypass operations: 
Single-beat read (caching-inhibited ||PA[O-31] A1010 1 sss aM 0 | 
or cache disabled) 
Single-beat write (caching-inhibited, ||PA[O—31] 00010 1 sss aM aW a 
write-through, or cache disabled) 
Special instructions: 
debz (addr-only) PA[0—28] || 0b000 01100 0 010 0* 0 1* 
debi (if HIDO[ABE] = 1, addr-only) |] PA[0O—26] || Ob00000 | 01100 0 010 —M 0 1* 
debf (if HIDO[ABE] = 1, addr-only) |]PA[0—26] || Ob00000 | 00100 0 010 —M 0 1* 
debst (if HIDO[ABE] = 1, addr-only) |]PA[0O—26] || Ob00000 | 00000 0 010 —M 0 1* 
sync (if HIDO[ABE] = 1, addr-only) |/O0x0000_0000 01000 0 010 0 0 0 
eieio (if HIDO[ABE] = 1, addr-only) |/O0x0000_0000 10000 0 010 0 0 0 
stwcx. (always single-beat write) PA[0—29] || Ob00 10010 1 100 aM aW a] 
eciwx PA[0—29] || 0b00 11100 EAR[28-31] 1 0 0 
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Table 3-6. Address/Transfer Attribute Summary (continued) 








Bus Transaction A[0-31] TT[0-4] | TBST | TsIz[o-2]/ GBL | WT | CI 





ecowx PA[0-29] |] 0b00 10100 EAR[28-31] 1 1 0 





























Notes: 

PA = Physical address, CA = Cache address. 

W,I,M = WIM state from address translation; 7 = complement; O*or 1* = WIM state implied by transaction type in table 
For instruction fetches, re ection of the M bit m ust be enabled through HIDO[IFEM]. 

A = Atomic; high if lwarx, low otherwise 

S = Transfer size 

Special instructions listed may not generate bus transactions depending on cache state. 


3.7 Bus Interface 


The bus interface buffers bus requests from the instruction and data caches, and executes 
the requests per the 60x bus protocol. It includes address register queues, prioritizing logic, 
and bus control logic. The bus interface also captures snoop addresses for snooping in the 
cache and in the address register queues, snoops for reservations, and holds the touch load 
address for the cache. All data storage for the address register buffers (load and store data 
buffers) are located in the cache section. The data buffers are considered temporary storage 
for the cache and not part of the bus interface. 
The general functions and features of the bus interface are as follows: 
e Seven address register buffers that include the following: 
— Instruction cache load address buffer 
— Data cache load address buffer 


— Two data cache castout/store address buffers (associated data block buffers 
located in cache) 


— Data cache snoop copy-back address buffer (associated data block buffer located 
in cache) 


— Reservation address buffer for snoop monitoring 
¢ Pipeline collision detection for data cache buffers 
e Reservation address snooping for lwarx/stwex. instructions 
¢ One-level address pipelining 
¢ Load ahead of store capability 
A conceptual block diagram of the bus interface is shown in Figure 3-7. The address 
register queues in the figure hold transaction requests that the bus interface may issue on 


the bus independently of the other requests. The bus interface may have up to two 
transactions operating on the bus at any given time through the use of address pipelining. 
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Figure 3-7. Bus Interface Address Buffers 


For additional information about the MPC750 bus interface and the bus protocols, refer to 
Chapter 8, “System Interface Operation.” 


3.8 MEI State Transactions 


Table 3-7 shows MEI state transitions for various operations. Bus operations are described 









































in Table 3-5. 
Table 3-7. MEI State Transitions 
@ache Bag Current Next 
Operation : WIM Cache Cache Cache Actions Bus Operation 
Operation | sync 
State State 
Load Read No x0x I Same 1 Cast out of modi ed Write-with-kill 
(T =0) block (as required) 
2 Pass four-beat read to | Read 
memory queue 
Load Read No x0x E,M Same Read data from cache — 
(T = 0) 
Load (T = 0) Read No x1x I Same Pass single-beat read to | Read 
memory queue 
Load (T = 0) Read No x1x E I CRTRY read —_— 
Load (T = 0) Read No x1x M I CRTRY read (push sector | Write-with-kill 
to write queue) 
lwarx Read Acts like other reads but bus operation uses special encoding 
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Table 3-7. MEI State Transitions (continued) 



















































































Gache Bus Current Next 
Operation : WIM Cache Cache Cache Actions Bus Operation 
Operation | sync 
State State 
Store Write No 00x I Same Cast out of modi ed b lock | Write-with-kill 
(T =0) (if necessary) 
Pass RWITM to memory |RWITM 
queue 
Store Write No 00x E,M M Write data to cache — 
(T = 0) 
Store # stwex. | Write No 10x I Same Pass single-beat write to | Write-with- ush 
(T =0) memory queue 
Store # stwex. | Write No 10x E Same Write data to cache — 
T=0 
( ) Pass single-beat write to | Write-with- ush 
memory queue 
Store # stwex. | Write No 10x M Same CRTRY write — 
T=0 
( ) Push block to write queue | Write-with-kill 
Store (T=0) | Write No x1x I Same Pass single-beat write to | Write-with- ush 
or stwex. memory queue 
(WIM = 10x) 
Store (T=0) | Write No x1x E I CRTRY write —_ 
or stwex. 
(WIM = 10x) 
Store (T=0) | Write No x1x M I CRTRY write — 
or stwex. : ; er te 
(WIM = 10x) Push block to write queue | Write-with-kill 
stwex. Conditional | If the reserved bit is set, this operation is like other writes except the bus operation 
write uses a special encoding. 
dcbf Data cache |No XXX LE Same CRTRY dcbf —_— 
block ush 
Pass ush Flush 
Same I State change only — 
dcbf Data cache |No XXX M I Push block to write queue | Write-with-kill 
block ush 
dcbst Data cache |No XXX LE Same CRTRY dcbst _— 
block store 
Pass clean Clean 
Same Same No action — 
dcbst Data cache |No XXX M E Push block to write queue | Write-with-kill 
block store 
dcbz Data cache |No x1x x x Alignment trap —_ 
block set to 
zero 
dcbz Data cache |No 10x x x Alignment trap — 
block set to 
zero 
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Table 3-7. MEI State Transitions (continued) 






























































Gache Bus Current Next 
Operation : WIM Cache Cache Cache Actions Bus Operation 
Operation | sync 
State State 
dcbz Data cache | Yes 00x I Same CRTRY dcbz —_ 
block set to ; ; aT. 
Cast out of modi ed b lock | Write-with-kill 
zero 
Pass kill Kill 
Same M Clear block —_— 
dcbz Data cache |No OOx E,M M Clear block — 
block set to 
zero 
dcbt Data cache |No x1x I Same Pass single-beat read to | Read 
block touch memory queue 
debt Data cache |No x1x E I CRTRY read _— 
block touch 
dcbt Data cache |No x1x M I CRTRY read —_ 
block touch ; F ae 
Push block to write queue | Write-with-kill 
dcbt Data cache |No x0x I Same Cast out of modi ed b lock | Write-with-kill 
block touch (as required) 
Pass four-beat read to Read 
memory queue 
debt Data cache |No x0x E,M Same No action — 
block touch 
Single-beat read | Reload No XXX I Same Forward data_in —_ 
dump 1 
Four-beat read | Reload No XXX I E Write data_in to cache —_ 
(double-word | dump 
-aligned) 
Four-beat write | Reload No XXX I M Write data_in to cache — 
(double-word- | dump 
aligned) 
E—-l| Snoop No XXX E I State change only — 
write or kill (committed) 
M—-l Snoop No XXX M I State change only —_ 
kill (committed) 
Push Snoop No XXX M I Conditionally push Write-with-kill 
Ml ush 
Push Snoop No XXX M E Conditionally push Write-with-kill 
M—->E clean 
tlbie TLB No XXX X Xx CRTRY TLBI — 
invalidate 
Pass TLBI —_— 
No action —_ 
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Table 3-7. MEI State Transitions (continued) 


Cache Bis Current Next 


Operation ; WIM Cache Cache Cache Actions Bus Operation 
Operation | sync 
State State 














sync Synchroni- | No XXX x x CRTRY sync — 
zation 

Pass sync _— 

No action —_ 


























Note that single-beat writes are not snooped in the write queue. 
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Chapter 4 
Exceptions 


This chapter describes the exceptions model for the MPC750. Note that the MPC755 
microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply 
for the MPC755 except as noted in Appendix C, “MPC755 Embedded G3 
Microprocessor.” 


The OEA portion of the PowerPC architecture defines the mechanism by which processors 
of this family implement exceptions (referred to as interrupts in the architecture 
specification). Exception conditions may be defined at other levels of the architecture. For 
example, the UISA defines conditions that may cause floating-point exceptions; the OEA 
defines the mechanism by which the exception is taken. 


The PowerPC exception mechanism allows the processor to change to supervisor state as a 
result of unusual conditions arising in the execution of instructions and from external 
signals, bus errors, or various internal conditions. When exceptions occur, information 
about the state of the processor is saved to certain registers and the processor begins 
execution at an address (exception vector) predetermined for each exception. Processing of 
exceptions begins in supervisor mode. 


Although multiple exception conditions can map to a single exception vector, often a more 
specific condition may be determined by examining a register associated with the 
exception—for example, the DSISR and the floating-point status and control register 
(FPSCR). Also, software can explicitly enable or disable some exception conditions. 


The PowerPC architecture requires that exceptions be taken in program order; therefore, 
although a particular implementation may recognize exception conditions out of order, they 
are handled strictly in order with respect to the instruction stream. When an 
instruction-caused exception is recognized, any unexecuted instructions that appear earlier 
in the instruction stream, including any that have not yet entered the execute state, are 
required to complete before the exception is taken. For example, if a single instruction 
encounters multiple exception conditions, those exceptions are taken and handled 
sequentially. Likewise, exceptions that are asynchronous and precise are recognized when 
they occur, but are not handled until all instructions currently in the execute stage 
successfully complete execution and report their results. 


To prevent loss of state information, exception handlers must save the information stored 
in the machine status save/restore registers, SRRO and SRR1, soon after the exception is 


Chapter 4. Exceptions 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
MPC750 Microprocessor Exceptions 


taken to prevent this information from being lost due to another exception being taken. 
Because exceptions can occur while an exception handler routine is executing, multiple 
exceptions can become nested. It is up to the exception handler to save the necessary state 
information if control is to return to the excepting program. 


In many cases, after the exception handler handles an exception, there is an attempt to 
execute the instruction that caused the exception. Instruction execution continues until the 
next exception condition is encountered. Recognizing and handling exception conditions 
sequentially guarantees that the machine state is recoverable and processing can resume 
without losing instruction results. 


In this book, the following terms are used to describe the stages of exception processing: 


Recognition Exception recognition occurs when the condition that can cause an 
exception is identified by the processor. 


Taken An exception is said to be taken when control of instruction 
execution is passed to the exception handler; that is, the context is 
saved and the instruction at the appropriate vector offset is fetched 
and the exception handler routine is begun in supervisor mode. 


Handling Exception handling is performed by the software linked to the 
appropriate vector offset. Exception handling is begun in supervisor 
mode (referred to as privileged state in the architecture 
specification). 


Note that the PowerPC architecture documentation refers to exceptions as interrupts. In this 
book, the term ‘interrupt’ is reserved to refer to asynchronous exceptions and sometimes to 
the event that causes the exception. Also, the PowerPC architecture uses the word 
‘exception’ to refer to IEEE-defined floating-point exception conditions that may cause a 
program exception to be taken; see Section 4.5.7, “Program Exception (0x00700).” The 
occurrence of these IEEE exceptions may not cause an exception to be taken. IEEE-defined 
exceptions are referred to as IEEE floating-point exceptions or floating-point exceptions. 


4.1 MPC750 Microprocessor Exceptions 


As specified by the PowerPC architecture, exceptions can be either precise or imprecise and 
either synchronous or asynchronous. Asynchronous exceptions are caused by events 
external to the processor’s execution; synchronous exceptions are caused by instructions. 
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The types of exceptions are shown in Table 4-1. Note that all exceptions except for the 
system management interrupt, thermal management, and performance monitor exception 
are defined, at least to some extent, by the PowerPC architecture. 


Table 4-1. MPC750 Microprocessor Exception Classifications 








Synchronous/Asynchronous | Precise/Imprecise Exception Types 
Asynchronous, nonmaskable Imprecise Machine check, system reset 
Asynchronous, maskable Precise External interrupt, decrementer, system management interrupt, 


performance monitor interrupt, thermal management interrupt 





Synchronous 





Precise Instruction-caused exceptions 








These classifications are discussed in greater detail in Section 4.2, “Exception Recognition 
and Priorities.’ For a better understanding of how the MPC750 implements precise 
exceptions, see Chapter 6, “Instruction Timing.” Exceptions implemented in the MPC750, 
and conditions that cause them, are listed in Table 4-2. 


Table 4-2. Exceptions and Conditions 


Vector Offset 












































Exception Type (hex) Causing Conditions 

Reserved 00000 — 

System reset 00100 Assertion of either HRESET or SRESET or at power-on reset 

Machine check 00200 Assertion of TEA during a data bus transaction, assertion of MCP, or an 
address, data, or L2 bus parity error. MSR[ME] must be set. 

DSI 00300 As speci ed in the P owerPC architecture. For TLB misses on load, store, or 
cache operations, a DSI exception occurs if a page fault occurs. 

ISI 00400 As de ned b y the PowerPC architecture 

External interrupt 00500 MSR[EE] = 1 and INT is asserted 

Alignment 00600 * oating-point load/store , stmw, stwex., Imw, lwarx, eciwx, or ecowx 

instruction operand is not word-aligned. 
* multiple/string load/store operation is attempted in little-endian mode 
* n operand of a debz instruction is on a page that is write-through or 
cache-inhibited for a virtual mode access. 

* n attempt to execute a dcbz instruction occurs when the cache is disabled. 

Program 00700 As de ned b y the PowerPC architecture 

Floating-point 00800 As de ned b y the PowerPC architecture 

unavailable 

Decrementer 00900 As de ned b y the PowerPC architecture, when the most-signi cant bit of the 
DEC register changes from 0 to 1 and MSR[EE] = 1 

Reserved OOA00-OOBFF | — 

System call 00CO00 Execution of the System Call (sc) instruction 

Trace 00D00 MSR[SE] =1 or a branch instruction is completing and MSR[BE] =1. The 


MPC750 differs from the OEA by not taking this exception on an isync. 
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Table 4-2. Exceptions and Conditions (continued) 


Vector Offset 


Causing Conditions 














(hex) 

Reserved 00E00 The MPC750 does not generate an exception to this vector. Other processors 
that implement the PowerPC architecture may use this vector for oating-point 
assist exceptions. 

Reserved 00E10-O0EFF | — 

Performance monitor OOFOO The limit speci ed in PMC nis met and MMCRO[ENINT] = 1 (MPC750-speci c) 

Instruction address 01300 IABR[0—29] matches EA[0-—29] of the next instruction to complete, IABR[TE] 

breakpoint matches MSR[IR], and IABR[BE] = 1 (MPC750-speci c) 

System management 01400 MSR[EE] = 1 and SMI is asserted (MPC750-speci c) 


interrupt 





Reserved 


01500-016FF 





Thermalmanagement 
interrupt 


01700 


Thermal management is enabled, junction temperature exceeds the threshold 
speci ed in THRM1 or THRM2, and MSR[EE] = 1 (MPC750-speci c) 








Reserved 





01800-02FFF 








4.2 Exception Recognition and Priorities 


Exceptions are roughly prioritized by exception class, as follows: 


1. Nonmaskable, asynchronous exceptions have priority over all other 
exceptions—system reset and machine check exceptions (although the machine 
check exception condition can be disabled so the condition causes the processor to 
go directly into the checkstop state). These exceptions cannot be delayed and do not 
wait for completion of any precise exception handling. 


2. Synchronous, precise exceptions are caused by instructions and are taken in strict 
program order. 


3. Imprecise exceptions (imprecise mode floating-point enabled exceptions) are 
caused by instructions and they are delayed until higher priority exceptions are 
taken. Note that the MPC750 does not implement an exception of this type. 


¢ Maskable asynchronous exceptions (external, decrementer, thermal management, 
system management, performance monitor, and interrupt exceptions) are delayed 
until higher priority exceptions are taken. 


The following list of exception categories describes how the MPC750 handles exceptions 
up to the point of signaling the appropriate interrupt to occur. Note that a recoverable state 
is reached if the completed store queue is empty (drained, not canceled) and any instruction 
that is next in program order and has been signaled to complete has completed. If 
MSR[RI] = 0, the MPC750 is in a nonrecoverable state. Also, instruction completion is 
defined as updating all architectural registers associated with that instruction, and then 
removing that instruction from the completion buffer. 
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e Exceptions caused by asynchronous events (interrupts). These exceptions are further 
distinguished by whether they are maskable and recoverable. 


— Asynchronous, nonmaskable, nonrecoverable 


System reset for assertion of HRESET—Has highest priority and is taken 
immediately regardless of other pending exceptions or recoverability. (Includes 
power-on reset) 


— Asynchronous, maskable, nonrecoverable 


Machine check exception—Has priority over any other pending exception 
except system reset for assertion of HRESET. Taken immediately regardless of 
recoverability. 


— Asynchronous, nonmaskable, recoverable 


System reset for SRESET—Has priority over any other pending exception 
except system reset for HRESET (or power-on reset), or machine check. Taken 
immediately when a recoverable state is reached. 


— Asynchronous, maskable, recoverable 


System management, performance monitor, thermal management, external, and 
decrementer interrupts—Before handling this type of exception, the next 
instruction in program order must complete. If that instruction causes another 
type of exception, that exception is taken and the asynchronous, maskable 
recoverable exception remains pending, until the instruction completes. Further 
instruction completion is halted. The asynchronous, maskable recoverable 
exception is taken when a recoverable state is reached. 


¢ Instruction-related exceptions. These exceptions are further organized into the point 
in instruction processing in which they generate an exception. 


— Instruction fetch 


ISI exceptions—Once this type of exception is detected, dispatching stops and 
the current instruction stream is allowed to drain out of the machine. If 
completing any of the instructions in this stream causes an exception, that 
exception is taken and the instruction fetch exception is discarded (but may be 
encountered again when instruction processing resumes). Otherwise, once all 
pending instructions have executed and a recoverable state is reached, the ISI 
exception is taken. 


— Instruction dispatch/execution 


Program, DSI, alignment, floating-point unavailable, system call, and instruction 
address breakpoint—This type of exception is determined during dispatch or 
execution of an instruction. The exception remains pending until all instructions 
before the exception-causing instruction in program order complete. The 
exception is then taken without completing the exception-causing instruction. If 
completing these previous instructions causes an exception, that exception takes 
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priority over the pending instruction dispatch/execution exception, which is then 
discarded (but may be encountered again when instruction processing resumes). 


— Post-instruction execution 


Trace—Trace exceptions are generated following execution and completion of 
an instruction while trace mode is enabled. If executing the instruction produces 
conditions for another type of exception, that exception is taken and the 
post-instruction exception is forgotten for that instruction. 


Note that these exception classifications correspond to how exceptions are prioritized, as 
described in Table 4-3. 


Table 4-3. MPC750 Exception Priorities 











Priority Exception Cause 
Asynchronous Exceptions (Interrupts) 
0 System reset Power on reset, assertion of HRESET and TRST (hard reset) 





Machine check 


Any enabled machine check condition (L2 data parity error, assertion of TEA or 
MCP) 
























































2 System reset Assertion of SRESET (soft reset) 
3 System management | Assertion of SMI 
4 External interrupt Assertion of INT 
5 Performance monitor | Any programmer-speci ed performance monitor condition 
6 Decrementer Decrementer passes through zero 
7 Thermal management | Any programmer-speci ed ther mal management condition 
Instruction Fetch Exceptions 
0 ISI Any ISI exception condition 
Instruction Dispatch/Execution Exceptions 

0 Instruction address Any instruction address breakpoint exception condition 

breakpoint 
1 Program Occurrence of an illegal instruction, privileged instruction, or trap exception 

condition. Note that oating-point enab led program exceptions have lower priority. 

2 System call System Call (sc) instruction 
3 Floating-point Any oating-point una vailable exception condition 

unavailable 
4 Program A oating-point enab led exception condition (lowest-priority program exception) 
5 DSI DSI exception due to eciwx, ecowx with EAR[E] = 0 (DSISR[11]). Lower priority 








DSI exception conditions are shown below. 
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Table 4-3. MPC750 Exception Priorities (continued) 


Priority Exception Cause 





6 Alignment Any alignment exception condition, prioritized as follows: 

1 Floating-point access not word-aligned 

2 Imw, stmw, Iwarx, stwex. not word-aligned 

3 eciwx or ecowx not word-aligned 

4 Multiple or string access with MSR[LE] set 

5 dcbz to write-through or cache-inhibited page or cache is disabled 





7 DSI BAT page protection violation 





8 DSI Any access except cache operations to a segment where SR[T] = 1 (DSISRJ[5]) or 
an access crosses from a T = 0 segment to one where T = 1 (DSISR[5]) 





9 DSI TLB page protection violation 





10 DSI DABR address match 





Post-Instruction Execution Exceptions 

















11 Trace ¢« MSR[SE] = 1 (or MSR[BE] = 1 for branches) 





System reset and machine check exceptions may occur at any time and are not delayed even 
if an exception is being handled. As a result, state information for an interrupted exception 
may be lost; therefore, these exceptions are typically nonrecoverable. An exception may not 
be taken immediately when it is recognized. 


4.3. Exception Processing 


When an exception is taken, the processor uses SRRO and SRR1I to save the contents of the 
MSR for the current context and to identify where instruction execution should resume after 
the exception is handled. 


When an exception occurs, the address saved in SRRO helps determine where instruction 
processing should resume when the exception handler returns control to the interrupted 
process. Depending on the exception, this may be the address in SRRO or at the next address 
in the program flow. All instructions in the program flow preceding this one will have 
completed execution and no subsequent instruction will have begun execution. This may be 
the address of the instruction that caused the exception or the next one (as in the case of a 
system call, trace, or trap exception). The SRRO register is shown in Figure 4-1. 


SRRO (Holds EA for Instruction in Interrupted Program Flow) 


Figure 4-1. Machine Status Save/Restore Register 0 (SRRO) 


SRR1 is used to save machine status (selected MSR bits and possibly other status bits as 
well) on exceptions and to restore those values when an rfi instruction is executed. SRR1 
is shown in Figure 4-2. 
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Exception-Specific Information and MSR Bit Values 


0 31 
Figure 4-2. Machine Status Save/Restore Register 1 (SRR1) 


For most exceptions, bits 2-4 and 10-12 of SRRI1 are loaded with exception-specific 
information and MSR[5-9, 16—31] are placed into the corresponding bit positions of SRR1. 


The MPC750’s MSR is shown in Figure 4-3. 





Reserved 














vo EERE EE BO Be 


12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Figure 4-3. Machine State Register (MSR) 


The MSR bits are defined in Table 4-4. 
Table 4-4. MSR Bit Settings 




















Bits | Name Description 
0 — | Reserved. Full function. * 
14 — | Reserved. Partial function." 
5-9 — | Reserved. Full function." 
10-12 — Reserved. Partial function." 
13 POW | Power management enable. Power management functions are implementation-dependent. See 


Chapter 10, “Power and Thermal Management.” 
0 Power management disabled (normal operation mode). 
1 Power management enabled (reduced power mode). 





14 — Reserved. Implementation-speci c. 





15 ILE | Exception little-endian mode. When an exception occurs, this bit is copied into MSR[LE] to select 
the endian mode for the context established by the exception. 





16 EE  |€External interrupt enable 
0 The processor delays recognition of external interrupts and decrementer exception conditions. 
1 The processor is enabled to take an external interrupt or the decrementer exception. 





17 PR _ | Privilege level 
0 The processor can execute both user- and supervisor-level instructions. 
1 The processor can only execute user-level instructions. 





18 FP | Floating-point available 
0 The processor prevents dispatch of oating-point instr uctions, including oating-point loads , 
stores, and moves. 
1 The processor can execute oating-point instr uctions and can take oating-point enab led 
program exceptions. 





19 ME_ | Machine check enable 
0 Machine check exceptions are disabled. 
1 Machine check exceptions are enabled. 





20 FEO |IEEE oating-point exception mode 0 (see Table 4-5). 
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Table 4-4. MSR Bit Settings (continued) 


Bits | Name Description 
Single-step trace enable 
0 The processor executes instructions normally. 
1 The processor generates a single-step trace exception upon the successful execution of every 
instruction except rfi, isynce, and sc. Successful execution means that the instruction caused no 
other exception. 


Branch trace enable 

0 The processor executes branch instructions normally. 

1 The processor generates a branch type trace exception when a branch instruction executes 
successfully. 








IEEE oating-point exception mode 1 (see Table 4-5). 





Reserved. This bit corresponds to the AL bit of the POWER architecture. 





Exception pre x. The setting of this bit speci es whether an e xception vector offset is prepended 
with Fs or Os. In the following description, nnnnn is the offset of the exception. 

0 Exceptions are vectored to the physical address 0x000n_nnnn. 

1 Exceptions are vectored to the physical address OxFFFn_nnnn. 


Instruction address translation 

0 Instruction address translation is disabled. 

1 Instruction address translation is enabled. 

For more information see Chapter 5, “Memory Management.” 





Data address translation 

0 Data address translation is disabled. 

1 Data address translation is enabled. 

For more information see Chapter 5, “Memory Management.” 


Eas Reserved. Full function’ 


Performance monitor marked mode. MPC750-speci c; de ned as reser ved by the PowerPC 
architecture. For more information about the performance monitor, see Section 4.5.13, 
“Performance Monitor Interrupt (OxOOFOO).” 

0 Process is not a marked process. 

1 Process is a marked process. 








30 RI Indicates whether system reset or machine check exception is recoverable. RI indicates whether 
from the perspective of the processor, it is safe to continue (that is, processor state data such as 
that saved to SRRO is valid), but it does not guarantee that the interrupted process is recoverable. 
0 Exception is not recoverable. 

1 Exception is recoverable. 





31 LE | Little-endian mode enable 
0 The processor runs in big-endian mode. 
1 The processor runs in little-endian mode. 





1 Full function reserved bits are saved in SRR1 when an exception occurs; partial function reserved bits are not saved. 
The IEEE floating-point exception mode bits (FEO and FE1) together define whether 
floating-point exceptions are handled precisely, imprecisely, or whether they are taken at 
all. As shown in Table 4-5, if either FEO or FE1 are set, the MPC750 treats exceptions as 
precise. MSR bits are guaranteed to be written to SRR1 when the first instruction of the 
exception handler is encountered. For further details, see Chapter 6, “Exceptions,” of the 
Programming Environments Manual. 
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Table 4-5. IEEE Floating-Point Exception Mode Bits 


FEO | FE1 Mode 





0 0 | Floating-point exceptions disabled 





Imprecise nonrecoverable. For this setting, the MPC750 operates in oating-point precise mode . 


ca Imprecise recoverable. For this setting, the MPC750 operates in oating-point precise mode . 





Leelee. | Floating-point precise mode 


4.3.1. Enabling and Disabling Exceptions 


When a condition exists that may cause an exception to be generated, it must be determined 
whether the exception is enabled for that condition. 

e JEEE floating-point enabled exceptions (a type of program exception) are ignored 
when both MSR[FEO] and MSR[FE1] are cleared. If either bit is set, all IEEE 
enabled floating-point exceptions are taken and cause a program exception. 

e Asynchronous, maskable exceptions (such as the external and decrementer 
interrupts) are enabled by setting MSR[EE]. When MSR[EE] = 0, recognition of 
these exception conditions is delayed. MSR[EE] is cleared automatically when an 
exception is taken to delay recognition of conditions causing those exceptions. 

e A machine check exception can occur only if the machine check enable bit, 
MSR[ME], is set. If MSR[ME] is cleared, the processor goes directly into checkstop 
state when a machine check exception condition occurs. Individual machine check 
exceptions can be enabled and disabled through bits in the HIDO register, which is 
described in Table 4-8. 


e System reset exceptions cannot be masked. 


4.3.2 Steps for Exception Processing 


After it is determined that the exception can be taken (by confirming that any 
instruction-caused exceptions occurring earlier in the instruction stream have been handled, 
and by confirming that the exception is enabled for the exception condition), the processor 
does the following: 
1. SRRO is loaded with an instruction address that depends on the type of exception. 
See the individual exception description for details about how this register is used 
for specific exceptions. 


2. SRR1[1-4, 10-15] are loaded with information specific to the exception type. 


3. SRR1[5-9, 16-31] are loaded with a copy of the corresponding MSR bits. 
Depending on the implementation, reserved bits may not be copied. 


4. The MSR is set as described in Table 4-4. The new values take effect as the first 
instruction of the exception-handler routine is fetched. 
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Note that MSR[IR] and MSR[DR] are cleared for all exception types; therefore, 
address translation is disabled for both instruction fetches and data accesses 
beginning with the first instruction of the exception-handler routine. 


Instruction fetch and execution resumes, using the new MSR value, at a location 
specific to the exception type. The location is determined by adding the exception's 
vector (see Table 4-2) to the base address determined by MSR[IP]. If IP is cleared, 
exceptions are vectored to the physical address 0x000n_nnnn. If IP is set, 
exceptions are vectored to the physical address OxFFFn_nnnn. For a machine check 
exception that occurs when MSR[ME] = 0 (machine check exceptions are 
disabled), the checkstop state is entered (the machine stops executing instructions). 
See Section 4.5.2, “Machine Check Exception (0x00200).” 


4.3.3 Setting MSR[RI] 
An operating system may handle MSR[RI] as follows: 


In the machine check and system reset exceptions—If MSR[RI] is cleared, the 
exception is not recoverable. If it is set, the exception is recoverable with respect to 
the processor. 

In each exception handler—When enough state information has been saved that a 
machine check or system reset exception can reconstruct the previous state, set 
MSR[RI]. 

In each exception handler—Clear MSR[RI], set SRRO and SRR1 appropriately, and 
then execute rfi. 

Note that the RI bit being set indicates that, with respect to the processor, enough 
processor state data remains valid for the processor to continue, but it does not 
guarantee that the interrupted process can resume. 


4.3.4 Returning from an Exception Handler 


The Return from Interrupt (rfi) instruction performs context synchronization by allowing 
previously-issued instructions to complete before returning to the interrupted process. In 
general, execution of the rfi instruction ensures the following: 


All previous instructions have completed to a point where they can no longer cause 
an exception. If a previous instruction causes a direct-store interface error exception, 
the results must be determined before this instruction is executed. 


Previous instructions complete execution in the context (privilege, protection, and 
address translation) under which they were issued. 
The rfi instruction copies SRR1 bits back into the MSR. 


Instructions fetched after this instruction execute in the context established by this 
instruction. 
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Program execution resumes at the instruction indicated by SRRO 


For a complete description of context synchronization, refer to Chapter 6, “Exceptions,” of 
the Programming Environments Manual. 


4.4 Process Switching 


The following instructions are useful for restoring proper context during process switching: 


e The sync instruction orders the effects of instruction execution. All instructions 
previously initiated appear to have completed before the sync instruction completes, 
and no subsequent instructions appear to be initiated until the sync instruction 
completes. For an example showing use of sync, see Chapter 2, “PowerPC Register 
Set,” of the Programming Environments Manual. 


¢ The isync instruction waits for all previous instructions to complete and then 
discards any fetched instructions, causing subsequent instructions to be fetched (or 
refetched) from memory and to execute in the context (privilege, translation, and 
protection) established by the previous instructions. 

e The stwex. instruction clears any outstanding reservations, ensuring that an lwarx 
instruction in an old process is not paired with an stwex. instruction in a new one. 


The operating system should set MSR[RI] as described in Section 4.3.3, “Setting 
MSR[RI].” 


4.5 Exception Definitions 


Table 4-6 shows all the types of exceptions that can occur with the MPC750 and MSR 
settings when the processor goes into supervisor mode due to an exception. Depending on 
the exception, certain of these bits are stored in SRR1 when an exception is taken. 


Table 4-6. MSR Setting Due to Exception 









































MSR Bit 
Exception Type 

POW | ILE | EE | PR| FP | ME | FEO| SE | BE | FE1| IP | IR | DR| PM| RI| LE 
System reset 0 — | 0 0 0}; — 0 0 0 0 }—|0} 0 0 | O | ILE 
Machine check 0 —/0/;0]0] 0 0 0 | 0 Oo ;—|]0/;} 0] 0)]0) ILE 
DSI 0 —/0/;0;0j]—] 0 0 | 0 Oo ;—|]0/; 0] 0 )]0) ILE 
ISI 0 —/0/;0;0j;—] 0 0 | 0 Oo ;—|]0/; 0] 00) ILE 
External interrupt 0 — 0 0 0}; — 0 0 0 0 |—|]0]; 0 0 | 0 | ILE 
Alignment 0 —/0/;0;0)]—] 0 0 | 0 Oo ;—|]0/;} 0] 0 )]0) ILE 
Program 0 —/0/;0;0);—] 0 0 | 0 Oo ;—|]0/; 0] 0 ]0) ILE 
Floating-point unavailable 0 — | 0 0 0}; — 0 0 0 0 }—|0} 0 0 | O | ILE 
Decrementer interrupt 0 — 0 0 0}; — 0 0 0 0 |—|]0]; 0 0 | 0 | ILE 
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Table 4-6. MSR Setting Due to Exception (continued) 




















MSR Bit 
Exception Type 

POW | ILE | EE | PR| FP | ME | FEO| SE | BE | FE1| IP | IR | DR| PM| RI| LE 
System call 0 — | 0 0;0/;—] 0 0 0 Oo ;}—|]0} 0 O | 0 | ILE 
Trace exception 0 — | 0 0 0}; — 0 0 0 0 }—|0} 0 0 | O | ILE 
System management 0 — | 0 0 0}; — 0 0 0 0 }—|0} 0 0 | O | ILE 
Performance monitor 0 — | 0 0 0 | — 0 0 0 Oo ;—|0] 0 0 |; 07] ILE 
Thermal management 0 — | 0 0 0}; — 0 0 0 0 }—|0} 0 0 | O | ILE 





















































0 Bit is cleared. 

ILE Bit is copied from the MSR[ILE]. 
—_— Bit is not altered 

Reserved bits are read as if written as 0. 


The setting of the exception prefix bit (IP) determines how exceptions are vectored. If the 
bit is cleared, exceptions are vectored to the physical address 0x000n_nnnn (where nnnnn 
is the vector offset); if IP is set, exceptions are vectored to physical address OxFFFn_nnnn. 
Table 4-2 shows the exception vector offset of the first instruction of the exception handler 
routine for each exception type. 


4.5.1 System Reset Exception (0x00100) 


The MPC750 implements the system reset exception as defined in the PowerPC 
architecture (OEA). The system reset exception is a nonmaskable, asynchronous exception 
signaled to the processor through the assertion of system-defined signals. In the MPC750, 
the exception is signaled by the assertion of either the SRESET or HRESET inputs, 
described more fully in Chapter 7, “Signal Descriptions.” 


Table 4-7 lists register settings when a system reset exception is taken. 


Table 4-7. System Reset Exception—Register Settings 





Register Setting Description 





SRRO Set to the effective address of the instruction that the processor would have attempted to execute next if 
no exception conditions were present. 








SRR1 0 Loaded with equivalent MSR bits 
1-4 Cleared 
5-9 Loaded with equivalent MSR bits 
10-15 Cleared 
16-31 Loaded with equivalent MSR bits 


Note that if the processor state is corrupted to the extent that execution cannot resume reliably, MSR[RI] 
(SRR1[30)) is cleared. 


MSR 





nooo 


et to value of ILE 
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If SRESET is asserted, the processor is first put in a recoverable state. To do this, the 
MPC750 allows any instruction at the point of completion to either complete or take an 
exception, blocks completion of any following instructions and allows the completion 
queue to drain. The state before the exception occurred is then saved as specified in the 
PowerPC architecture and instruction fetching begins at the system reset interrupt vector 
offset, Ox00100. The vector address on a soft reset depends on the setting of MSR[IP] 
(either 0xO000_0100 or OXFFFO_0100). Soft resets are third in priority, after hard reset and 
machine check. This exception is recoverable provided attaining a recoverable state does 
not generate a machine check. 


SRESET is an edge-sensitive signal that can be asserted and deasserted asynchronously, 
provided the minimum pulse width specified in the hardware specifications is met. 
Asserting SRESET causes the MPC750 to take a system reset exception. This exception 
modifies the MSR, SRRO, and SRR1, as described in the Programming Environments 
Manual. Unlike hard reset, soft reset does not directly affect the states of output signals. 
Attempts to use SRESET during a hard reset sequence or while the JTAG logic is non-idle 
cause unpredictable results. 


A hard reset is initiated by asserting HRESET. Hard reset is used primarily for power-on 
reset (POR) (in which case TRST must also be asserted), but can also be used to restart a 
running processor. The HRESET signal must be asserted during power up and must remain 
asserted for a period that allows the PLL to achieve lock and the internal logic to be reset. 
This period is specified in the hardware specifications. The MPC750 internal state after the 
hard reset interval is defined in Table 2-19. If HRESET is asserted for less than this amount 
of time, the results are not predictable. If HRESET is asserted during normal operation, all 
operations cease and the machine state is lost. 


The MPC750 implements HIDO[NHR], which helps software distinguish a hard reset from 
a soft reset. Because this bit is cleared by a hard reset, but not by a soft reset, software can 
set this bit after a hard reset and tell whether a subsequent reset is a hard or soft reset by 
examining whether this bit is still set. See Section2.1.2.2, “Hardware 
Implementation-Dependent Register 0.” 





4.5.2 Machine Check Exception (0x00200) 


The MPC750 implements the machine check exception as defined in the PowerPC 
architecture (OEA). It conditionally initiates a machine check exception after an address or 
data parity error occurred on the bus or in L2 cache, after receiving a qualified transfer error 
acknowledge (TEA) indication on the MPC750 bus, or after the machine check interrupt 
(MCP) signal had been asserted. As defined in the OEA, the exception is not taken if 
MSR[ME] is cleared, in which case the processor enters checkstop state. 
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Certain machine check conditions can be enabled and disabled using HIDO bits, as 
described in Table 4-8. 


Table 4-8. HIDO Machine Check Enable Bits 





Bits | Name Function 





0 | EMCP | Enable MCP The primary purpose of this bit is to mask out further machine check exceptions caused 
by assertion of MCP, similar to how MSR[EE] can mask external interrupts. 

0 Masks MCP Asserting MCP does not generate a machine check exception or a checkstop. 

1 Asserting MCP causes a checkstop if MSR[ME] = 0 or a machine check exception if MSR[ME] = 1. 








1 DBP | Enable/disable 60x bus address and data parity generation. 

0 If address or data parity is not used by the system and the respective parity checking is disabled 
(HIDO[EBA] or HIDO[EBD] = 0), input receivers for those signals are disabled, do not require pull-up 
resistors, and therefore should be left unconnected. If all parity generation is disabled, all parity 
checking should also be disabled and parity signals need not be connected. 

1 Parity generation is enabled. 





2 EBA | Enable/disable 60x bus address parity checking. 

0 Prevents address parity checking. 

1 Allows a address parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if 
MSR[ME] = 1. 

EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. 


3 EBD | Enable 60x bus data parity checking 

0 Parity checking is disabled. 

1 Allows a data parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if 
MSR[ME] = 1. 

EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. 








15 | NHR |Not hard reset (software use only) 
0 Ahard reset occurred if software had previously set this bit 
1 Ahard reset has not occurred. 











A TEA indication on the bus can result from any load or store operation initiated by the 
processor. In general, TEA is expected to be used by a memory controller to indicate that a 
memory parity error or an uncorrectable memory ECC error has occurred. Note that the 
resulting machine check exception is imprecise and unordered with respect to the 
instruction that originated the bus operation. 


If MSR[ME] and the appropriate HIDO bits are set, the exception is recognized and 
handled; otherwise, the processor generates an internal checkstop condition. When a 
processor is in checkstop state, instruction processing is suspended and generally cannot 
continue without restarting the processor. Note that many conditions may lead to the 
checkstop condition; the disabled machine check exception is only one of these. 


A machine check exception may result from referencing a nonexistent physical address, 
either directly (with MSR[DR] = 0) or through an invalid translation. If a debz instruction 
introduces a block into the cache associated with a nonexistent physical address, a machine 
check exception can be delayed until an attempt is made to store that block to main memory. 
Not all processors that implement the PowerPC architecture provide the same level of error 
checking. Checkstop sources are implementation-dependent. 
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Machine check exceptions are enabled when MSR[ME] = 1; this is described in the 
following section, Section 4.5.2.1, “Machine Check Exception Enabled (MSR[ME] = 1).” 
If MSR[ME] = 0 and a machine check occurs, the processor enters the checkstop state. 
Checkstop state is described in Section 4.5.2.2, “Checkstop State MSR[ME] = 0).” 


4.5.2.1. Machine Check Exception Enabled (MSR[ME] = 1) 


Machine check exceptions are enabled when MSR[ME] = 1. When a machine check 
exception is taken, registers are updated as shown in Table 4-9. 


Table 4-9. Machine Check Exception—Register Settings 





Register Setting Description 


SRRO_ | Ona best-effort basis the MPC750 can set this to an EA of some instruction that was executing or about to 
be executing when the machine check condition occurred. 





SRR1 |0-10 Cleared 

11 Set when an L2 data cache parity error is detected, otherwise zero 
12 Set when MCP signal is asserted, otherwise zero 

13 Set when TEA signal is asserted, otherwise zero 

14 Set when a data bus parity error is detected, otherwise zero 

15 Set when an address bus parity error is detected, otherwise zero 
16-31 MSR[16-31] 





MSR |POW 0 FP 0 BE 0 DR 0 
LE — ME 0 FE1 0 PM 0 
EE. 0 FEO 0O IP _— RI 0 
PR 0O SE 0 IR 0 LE Set to value of ILE 























Note that to handle another machine check exception, the exception handler should set MSR[ME] as soon as it is 
practical after a machine check exception is taken. Otherwise, subsequent machine check exceptions cause the 
processor to enter the checkstop state. 


The machine check exception is usually unrecoverable in the sense that execution cannot 
resume in the context that existed before the exception. If the condition that caused the 
machine check does not otherwise prevent continued execution, MSR[ME] is set to allow 
the processor to continue execution at the machine check exception vector address. 
Typically, earlier processes cannot resume; however, operating systems can use the 
machine check exception handler to try to identify and log the cause of the machine check 
condition. 


When a machine check exception is taken, instruction fetching resumes at offset 0x00200 
from the physical base address indicated by MSR[IP]. 


4.5.2.2 Checkstop State (MSR[ME] = 0) 
If MSR[ME] = 0 and a machine check occurs, the processor enters the checkstop state. 


When a processor is in checkstop state, instruction processing is suspended and generally 
cannot resume without the processor being reset. The contents of all latches are frozen 
within two cycles upon entering checkstop state. 
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4.5.3 DSI Exception (0x00300) 


A DSI exception occurs when no higher priority exception exists and an error condition 
related to a data memory access occurs. The DSI exception is implemented as it is defined 
in the PowerPC architecture (OEA). In case of a TLB miss for a load, store, or cache 
operation, a DSI exception is taken if the resulting hardware table search causes a page 
fault. 


On the MPC750, a DSI exception is taken when a load or store is attempted to a direct-store 
segment (SR[T] = 1). In the MPC750, a floating-point load or store to a direct-store 
segment causes a DSI exception rather than an alignment exception, as specified by the 
PowerPC architecture. 


The MPC750 also implements the data address breakpoint facility, which is defined as 
optional in the PowerPC architecture and is supported by the optional data address 
breakpoint register (DABR). Although the architecture does not strictly prescribe how this 
facility must be implemented, the MPC750 follows the recommendations provided by the 
architecture and described in the Chapter 2, “Programming Model,’ and Chapter 6 
“Exceptions,” in the Programming Environments Manual. 


4.5.4 ISI Exception (0x00400) 


An ISI exception occurs when no higher priority exception exists and an attempt to fetch 
the next instruction fails. This exception is implemented as it is defined by the PowerPC 
architecture (OEA), and is taken for the following conditions: 

e The effective address cannot be translated. 

e The fetch access is to a no-execute segment (SR[N] = 1). 

e The fetch access is to guarded storage and MSR[IR] = 1. 

e The fetch access is to a segment for which SR[T] is set. 

e The fetch access violates memory protection. 


When an ISI exception is taken, instruction fetching resumes at offset 0x00400 from the 
physical base address indicated by MSR[IP]. 


4.5.5 External Interrupt Exception (0x00500) 


An external interrupt is signaled to the processor by the assertion of the external interrupt 
signal (INT). The INT signal is expected to remain asserted until the MPC750 takes the 
external interrupt exception. If INT is negated early, recognition of the interrupt request is 
not guaranteed. After the MPC750 begins execution of the external interrupt handler, the 
system can safely negate the INT. When the MPC750 detects assertion of INT, it stops 
dispatching and waits for all pending instructions to complete. This allows any instructions 
in progress that need to take an exception to do so before the external interrupt is taken. 
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After all instructions have vacated the completion buffer, the MPC750 takes the external 
interrupt exception as defined in the PowerPC architecture (OEA). 


An external interrupt may be delayed by other higher priority exceptions or if MSR[EE] is 
cleared when the exception occurs. Register settings for this exception are described in 
Chapter 6, “Exceptions,” in the Programming Environments Manual. 


When an external interrupt exception is taken, instruction fetching resumes at offset 
0x00500 from the physical base address indicated by MSR[IP]. 


4.5.6 Alignment Exception (0x00600) 


The MPC750 implements the alignment exception as defined by the PowerPC architecture 
(OEA). An alignment exception is initiated when any of the following occurs: 

e The operand of a floating-point load or store is not word-aligned. 

¢ The operand of Imw, stmw, Ilwarx, or stwex. is not word-aligned. 

¢ The operand of dcbz is in a page that is write-through or cache-inhibited. 

e An attempt is made to execute dcbz when the data cache is disabled. 

e An eciwx or ecowx is not word-aligned 

e A multiple or string access is attempted with MSR[LE] set 


Note that in the MPC750, a floating-point load or store to a direct-store segment causes a 
DSI exception rather than an alignment exception, as specified by the PowerPC 
architecture. For more information, see 4.5.3, “DSI Exception (0x00300).” 


4.5.7 Program Exception (0x00700) 


The MPC750 implements the program exception as it is defined by the PowerPC 
architecture (OEA). A program exception occurs when no higher priority exception exists 
and one or more of the exception conditions defined in the OEA occur. 


The MPC750 invokes the system illegal instruction program exception when it detects any 
instruction from the illegal instruction class. The MPC750 fully decodes the SPR field of 
the instruction. If an undefined SPR is specified, a program exception is taken. 


The UISA defines mtspr and mfspr with the record bit (Rc) set as causing a program 
exception or giving a boundedly-undefined result. In the MPC750, the appropriate 
condition register (CR) should be treated as undefined. Likewise, the PowerPC architecture 
states that the Floating Compared Unordered (fempu) or Floating Compared Ordered 
(fempo) instruction with the record bit set can either cause a program exception or provide 
a boundedly-undefined result. In the MPC750, an the BF field in an instruction encoding 
for these cases is considered undefined. 
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The MPC750 does not support either of the two floating-point imprecise modes supported 
by the PowerPC architecture. Unless exceptions are disabled (MSR[FEO] = MSR[FE1] = 
(Q), all floating-point exceptions are treated as precise. 


When a program exception is taken, instruction fetching resumes at offset 0x00700 from 
the physical base address indicated by MSR[IP]. Chapter 6, “Exceptions,” in the 
Programming Environments Manual describes register settings for this exception. 


4.5.8 Floating-Point Unavailable Exception (0x00800) 


The floating-point unavailable exception is implemented as defined in the PowerPC 
architecture. A floating-point unavailable exception occurs when no higher priority 
exception exists, an attempt is made to execute a floating-point instruction (including 
floating-point load, store, or move instructions), and the floating-point available bit in the 
MSR is disabled, (MSR[FP] = 0). Register settings for this exception are described in 
Chapter 6, “Exceptions,” in the Programming Environments Manual. 


When a floating-point unavailable exception is taken, instruction fetching resumes at offset 
0x00800 from the physical base address indicated by MSR[IP]. 


4.5.9 Decrementer Exception (0x00900) 


The decrementer exception is implemented in the MPC750 as it is defined by the PowerPC 
architecture. The decrementer exception occurs when no higher priority exception exists, a 
decrementer exception condition occurs (for example, the decrementer register has 
completed decrementing), and MSR[EE] = 1. In the MPC750, the decrementer register is 
decremented at one fourth the bus clock rate. Register settings for this exception are 
described in Chapter 6, “Exceptions,” in the Programming Environments Manual. 


When a decrementer exception is taken, instruction fetching resumes at offset 0x00900 
from the physical base address indicated by MSR[IP]. 


4.5.10 System Call Exception (0x00C00) 


A system call exception occurs when a System Call (se) instruction is executed. In the 
MPC750, the system call exception is implemented as it is defined in the PowerPC 
architecture. Register settings for this exception are described in Chapter 6, “Exceptions,” 
in the Programming Environments Manual. 


When a system call exception is taken, instruction fetching resumes at offset OxO0C00 from 
the physical base address indicated by MSR[IP]. 


4.5.11 Trace Exception (Ox00D00) 


The trace exception is taken if MSR[SE] = 1 or if MSR[BE] = 1 and the currently 
completing instruction is a branch. Each instruction considered during trace mode 
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completes before a trace exception is taken. Register settings for this exception are 
described in Chapter 6, “Exceptions,” in the Programming Environments Manual. 


Implementation Note—The MPC750 processor diverges from the PowerPC architecture 
in that it does not take trace exceptions on the isync instruction. 


When a trace exception is taken, instruction fetching resumes as offset OxOODOO from the 
base address indicated by MSR[IP]. 


4.5.12 Floating-Point Assist Exception (Ox00E00) 


The optional floating-point assist exception defined by the PowerPC architecture is not 
implemented in the MPC750. 


4.5.13 Performance Monitor Interrupt (OxOOFOO) 


The MPC750 microprocessor provides a performance monitor facility to monitor and count 
predefined events such as processor clocks, misses in either the instruction cache or the data 
cache, instructions dispatched to a particular execution unit, mispredicted branches, and 
other occurrences. The count of such events can be used to trigger the performance monitor 
exception. The performance monitor facility is not defined by the PowerPC architecture. 


The performance monitor can be used for the following: 


¢ To increase system performance with efficient software, especially in a 
multiprocessing system. Memory hierarchy behavior must be monitored and studied 
to develop algorithms that schedule tasks (and perhaps partition them) and that 
structure and distribute data optimally. 


¢ To help system developers bring up and debug their systems. 


The performance monitor uses the following SPRs: 


e The performance monitor counter registers (PMC1—PMC4) are used to record the 
number of times a certain event has occurred. UPMC1—UPMC4 provide user-level 
read access to these registers. 


e The monitor mode control registers (MMCRO-MMCR1) are used to enable various 
performance monitor interrupt functions. UMMCRO-UMMCR I provide user-level 
read access to these registers. 

e The sampled instruction address register (SIA) contains the effective address of an 
instruction executing at or around the time that the processor signals the 
performance monitor interrupt condition. The USIA register provides user-level 
read access to the SIA. 
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Table 4-10 lists register settings when a performance monitor interrupt exception is taken. 
Table 4-10. Performance Monitor Interrupt Exception—Register Settings 


Register Setting Description 





SRRO_ | Set to the effective address of the instruction that the processor would have attempted to execute next if 
no exception conditions were present. 





SRR1 |0 Loaded with equivalent MSR bits 
1-4 Cleared 
5-9 Loaded with equivalent MSR bits 
10-15 Cleared 
16-31 Loaded with equivalent MSR bits 























MSR |POW 0 FP 0 BE 0 DR 0O 
ILE 9 — ME — FE1 0 PM 0 
EE 0 FEO 0 IP _— RI 0 
PR 0O SE 0 IR 0 LE —_ Set to value of ILE 





As with other PowerPC exceptions, the performance monitor interrupt follows the normal 
PowerPC exception model with a defined exception vector offset (OxOOFO0). The priority 
of the performance monitor interrupt lies between the external interrupt and the 
decrementer interrupt (see Table 4-3). The contents of the SIA are described in 
Section 2.1.2.4, “Performance Monitor Registers.” The performance monitor is described 
in Chapter 11, “Performance Monitor.” 


4.5.14 Instruction Address Breakpoint Exception (0x01300) 


An instruction address breakpoint interrupt occurs when the following conditions are met: 


e The instruction breakpoint address IABR[O—29] matches EA[0—29] of the next 
instruction to complete in program order. The instruction that triggers the instruction 
address breakpoint exception is not executed before the exception handler is 
invoked. 


¢ The translation enable bit [ABR[TE]) matches MSR[IR]. 
e The breakpoint enable bit (ABR[BE]) is set. The address match is also reported to 
the JTAG/COP block, which may subsequently generate a soft or hard reset. The 


instruction tagged with the match does not complete before the breakpoint exception 
is taken. 


Table 4-11 lists register settings when an instruction address breakpoint exception is taken. 


Table 4-11. Instruction Address Breakpoint Exception— 
Register Settings 





Register Setting Description 





SRRO_ | Set to the effective address of the instruction that the processor would have attempted to execute next if 
no exception conditions were present. 
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Table 4-11. Instruction Address Breakpoint Exception— 
Register Settings (continued) 





SRR1 0 Loaded with equivalent MSR bits 
1-4 Cleared 
5-9 Loaded with equivalent MSR bits 
10-15 Cleared 
16-31 Loaded with equivalent MSR bits 





MSR POW 0 FP 0O BE 0 DR 0O 
ILE — ME — FE1 0 PM 0 
EE 0 FEO 0O IP — RI 0 
PR 0O SE 0 IR 0 LE Set to value of ILE 























The MPC750 requires that an mtspr to the IABR be followed by a context-synchronizing 
instruction. The MPC750 cannot generate a breakpoint response for that 
context-synchronizing instruction if the breakpoint is enabled by the mtspr(IABR) 
immediately preceding it. The MPC750 also cannot block a breakpoint response on the 
context-synchronizing instruction if the breakpoint was disabled by the mtspr(IABR) 
instruction immediately preceding it. The format of the IABR register is shown in 
Section 2.1.2.1, “Instruction Address Breakpoint Register (IABR).” 


When an instruction address breakpoint exception is taken, instruction fetching resumes as 
offset 0x01300 from the base address indicated by MSR[IP]. 


4.5.15 System Management Interrupt (0x01400) 


The MPC750 implements a system management interrupt exception, which is not defined 
by the PowerPC architecture. The system management exception is very similar to the 
external interrupt exception and is particularly useful in implementing the nap mode. It has 
priority over an external interrupt (see Table 4-3), and it uses a different vector in the 
exception table (offset 0x01400). 


Table 4-12 lists register settings when a system management interrupt exception is taken. 


Table 4-12. System Management Interrupt Exception—Register Settings 





Register Setting Description 





SRRO_ | Set to the effective address of the instruction that the processor would have attempted to execute next if 
no exception conditions were present. 





SRR1 |0 Loaded with equivalent MSR bits 
1-4 Cleared 
5-9 Loaded with equivalent MSR bits 
10-15 Cleared 
16-31 Loaded with equivalent MSR bits 























MSR |POW 0 FP 0 BE 0 DR 0O 
LE — ME — FE1 0 PM 0 
EE 0 FEO 0O IP — RI 0 
PR 0O SE 0 IR 0 LE Set to value of ILE 
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Like the external interrupt, a system management interrupt is signaled to the MPC750 by 
the assertion of an input signal. The system management interrupt signal (SMI) is expected 
to remain asserted until the interrupt is taken. If SMI is negated early, recognition of the 
interrupt request is not guaranteed. After the MPC750 begins execution of the system 
management interrupt handler, the system can safely negate SMI. After the assertion of 
SMI is detected, the MPC750 stops dispatching instructions and waits for all pending 
instructions to complete. This allows any instructions in progress that need to take an 
exception to do so before the system management interrupt is taken. 


When a system management interrupt exception is taken, instruction fetching resumes as 
offset 0x01400 from the base address indicated by MSR[IP]. 


4.5.16 Thermal Management Interrupt Exception (0x01700) 


A thermal management interrupt is generated when the junction temperature crosses a 
threshold programmed in either THRM1 or THRM2. The exception is enabled by the TIE 
bit of either THRM1 or THRM2, and can be masked by setting MSR[EE]. 


Table 4-13 lists register settings when a thermal management interrupt exception is taken. 
Table 4-13. Thermal Management Interrupt Exception—Register Settings 





Register Setting Description 





SRRO Set to the effective address of the instruction that the processor would have attempted to execute next if 
no exception conditions were present. 


SRR1 0 Loaded with equivalent MSR bits 
1-4 Cleared 
5-9 Loaded with equivalent MSR bits 
10-15 Cleared 
16-31 Loaded with equivalent MSR bits 





MSR POW 0 FP 0 BE 0 DR 0 
ILE = =— ME — FE1 0 PM 0 
EE 0 FEO O IP — RI 0 
PR 0O SE 0 IR 0 LE = Set to value of ILE 























The thermal management interrupt is similar to the system management and external 
interrupts. The MPC750 requires the next instruction in program order to complete or take 
an exception, blocks completion of any following instructions, and allows the completed 
store queue to drain. Any exceptions encountered in this process are taken first and the 
thermal management interrupt exception is delayed until a recoverable halt is achieved, at 
which point the MPC750 saves the machine state, as shown in Table 4-13. When a thermal 
management interrupt exception is taken, instruction fetching resumes as offset 0x01700 
from the base address indicated by MSR[IP]. 


Chapter 10, “Power and Thermal Management,” gives details about thermal management. 
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Memory Management 


This chapter describes the MPC750 microprocessor’s implementation of the memory 
management unit (MMU) specifications provided by the operating environment 
architecture (OEA) for processors that implement the PowerPC architecture. Note that the 
MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the 
MPC750 apply for the MPC755 except as noted in Appendix C, “MPC755 Embedded G3 
Microprocessor.” 


The primary function of the MMU in a processor of this family is the translation of logical 
(effective) addresses to physical addresses (referred to as real addresses in the architecture 
specification) for memory accesses and I/O accesses (I/O accesses are assumed to be 
memory-mapped). In addition, the MMU provides access protection on a segment, block, 
or page basis. This chapter describes the specific hardware used to implement the MMU 
model of the OEA in the MPC750. Refer to Chapter 7, “Memory Management,” in the 
Programming Environments Manual for a complete description of the conceptual model. 
Note that the MPC750 does not implement the optional direct-store facility and it is not 
likely to be supported in future devices. 


Two general types of memory accesses generated by processors that implement the 
PowerPC architecture require address translation—instruction accesses and data accesses 
generated by load and store instructions. Generally, the address translation mechanism is 
defined in terms of the segment descriptors and page tables defined by the PowerPC 
architecture for locating the effective-to-physical address mapping for memory accesses. 
The segment information translates the effective address to an interim virtual address, and 
the page table information translates the interim virtual address to a physical address. 


The segment descriptors, used to generate the interim virtual addresses, are stored as 
on-chip segment registers on 32-bit implementations (such as the MPC750). In addition, 
two translation lookaside buffers (TLBs) are implemented on the MPC750 to keep 
recently-used page address translations on-chip. Although the PowerPC OEA describes one 
MMU (conceptually), the MPC750 hardware maintains separate TLBs and table search 
resources for instruction and data accesses that can be performed independently (and 
simultaneously). Therefore, the MPC750 is described as having two MMUs, one for 
instruction accesses (IMMU) and one for data accesses (DMMU). 


The block address translation (BAT) mechanism is a software-controlled array that stores 
the available block address translations on-chip. BAT array entries are implemented as pairs 
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of BAT registers that are accessible as supervisor special-purpose registers (SPRs). There 
are separate instruction and data BAT mechanisms, and in the MPC750, they reside in the 
instruction and data MMUs, respectively. 


The MMUs, together with the exception processing mechanism, provide the necessary 
support for the operating system to implement a paged virtual memory environment and for 
enforcing protection of designated memory areas. Exception processing is described in 
Chapter 4, “Exceptions.” Section 4.3, “Exception Processing,” describes the MSR, which 
controls some of the critical functionality of the MMUs. 


5.1 MMU Overview 


The MPC750 implements the memory management specification of the PowerPC OEA for 
32-bit implementations. Thus, it provides 4 Gbytes of effective address space accessible to 
supervisor and user programs, with a 4-Kbyte page size and 256-Mbyte segment size. In 
addition, the MMUs of 32-bit processors of this family use an interim virtual address (52 
bits) and hashed page tables in the generation of 32-bit physical addresses. These 
processors also have a BAT mechanism for mapping large blocks of memory. Block sizes 
range from 128 Kbyte to 256 Mbyte and are software-programmable. 


Basic features of the MPC750 MMU implementation defined by the OEA are as follows: 


¢ Support for real addressing mode—Effective-to-physical address translation can be 
disabled separately for data and instruction accesses. 


¢ Block address translation—Each of the BAT array entries (four IBAT entries and 
four DBAT entries) provides a mechanism for translating blocks as large as 
256 Mbytes from the 32-bit effective address space into the physical memory space. 
This can be used for translating large address ranges whose mappings do not change 
frequently. 


¢ Segmented address translation—The 32-bit effective address is extended to a 52-bit 
virtual address by substituting 24 bits of upper address bits from the segment 
register, for the 4 upper bits of the EA, which are used as an index into the segment 
register file. This 52-bit virtual address space is divided into 4-Kbyte pages, each of 
which can be mapped to a physical page. 


The MPC750 also provides the following features that are not required by the PowerPC 
architecture: 

e Separate translation lookaside buffers (TLBs)—The 128-entry, two-way 
set-associative ITLBs and DTLBs keep recently-used page address translations 
on-chip. 

e Table search operations performed in hardware—The 52-bit virtual address is 
formed and the MMU attempts to fetch the PTE, which contains the physical 
address, from the appropriate TLB on-chip. If the translation is not found in a TLB 
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(that is, a TLB miss occurs), the hardware performs a table search operation (using 
a hashing function) to search for the PTE. 

¢ TLB invalidation—The MPC750 implements the optional TLB Invalidate Entry 
(tlbie) and TLB Synchronize (tlbsync) instructions, which can be used to invalidate 


TLB entries. For more information on the tlbie and tlbsync instructions, see 
Section 5.4.3.2, “TLB Invalidation.”’ 


Table 5-1 summarizes the MPC750 MMU features, including those defined by the 
PowerPC architecture (OEA) for 32-bit processors and those specific to the MPC750. 


Table 5-1. MMU Feature Summary 





Feature Category 


Architecturally Defined/ 
MPC750-Specific 


Feature 





Address ranges 


Architecturally de ned 


2°2 bytes of effective address 





2°2 bytes of virtual address 





2°2 bytes of physical address 





Page size 


Architecturally de ned 


4 Kbytes 





Segment size 


Architecturally de ned 


256 Mbytes 





Block address 
translation 


Architecturally de ned 


Range of 128 Kbyte—256 Mbyte sizes 





Implemented with IBAT and DBAT registers in BAT array 





Memory protection 


Architecturally de ned 


Segments selectable as no-execute 





Pages selectable as user/supervisor and read-only or guarded 





Blocks selectable as user/supervisor and read-only or guarded 





Page history 


Architecturally de ned 


Referenced and changed bits de ned and maintained 





Page address 
translation 


Architecturally de ned 


Translations stored as PTEs in hashed page tables in memory 





Page table size determined by mask in SDR1 register 





TLBs 


Architecturally de ned 


Instructions for maintaining TLBs (tlbie and tlbsync instructions 
in MPC750) 





MPC750-speci c 


128-entry, two-way set associative ITLB 
128-entry, two-way set associative DTLB 
LRU replacement algorithm 








Segment descriptors 


Page table search 
support 





Architecturally de ned 


MPC750-speci c 





Stored as segment registers on-chip (two identical copies 
maintained) 


The MPC750 performs the table search operation in hardware. 





5.1.1. Memory Addressing 


A program references memory using the effective (logical) address computed by the 
processor when it executes a load, store, branch, or cache instruction, and when it fetches 
the next instruction. The effective address is translated to a physical address according to 
the procedures described in Chapter 7, “Memory Management,” in the Programming 
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Environments Manual, augmented with information in this chapter. The memory subsystem 
uses the physical address for the access. 


For a complete discussion of effective address calculation, see Section 2.3.2.3, “Effective 
Address Calculation.” 


5.1.2 MMU Organization 


Figure 5-1 shows the conceptual organization of a PowerPC MMU in a 32-bit 
implementation; note that it does not describe the specific hardware used to implement the 
memory management function for a particular processor. Processors may optionally 
implement on-chip TLBs, hardware support for the automatic search of the page tables for 
PTEs, and other hardware features (invisible to the system software) not shown. 


The MPC750 maintains two on-chip TLBs with the following characteristics: 
e 128 entries, two-way set associative (64 x 2), LRU replacement 
¢ Data TLB supports the DMMU; instruction TLB supports the IMMU 
e Hardware TLB update 
e Hardware update of referenced (R) and changed (C) bits in the translation table 


In the event of a TLB miss, the hardware attempts to load the TLB based on the results of 
a translation table search operation. 


Figure 5-2 and Figure 5-3 show the conceptual organization of the MPC750 instruction and 
data MMUs, respectively. The instruction addresses shown in Figure 5-2 are generated by 
the processor for sequential instruction fetches and addresses that correspond to a change 
of program flow. Data addresses shown in Figure 5-3 are generated by load, store, and 
cache instructions. 


As shown in the figures, after an address is generated, the high-order bits of the effective 
address, EA[0—19] (or a smaller set of address bits, EA[O—n], in the cases of blocks), are 
translated into physical address bits PA[O—19]. The low-order address bits, A[20-31], are 
untranslated and are therefore identical for both effective and physical addresses. After 
translating the address, the MMUs pass the resulting 32-bit physical address to the memory 
subsystem. 


The MMUs record whether the translation is for an instruction or data access, whether the 
processor is in user or supervisor mode and, for data accesses, whether the access is a load 
or a store operation. The MMUs use this information to appropriately direct the address 
translation and to enforce the protection hierarchy programmed by the operating system. 
Section 4.3, “Exception Processing,’ describes the MSR, which controls some of the 
critical functionality of the MMUs. 


The figures show how address bits A[20—26] index into the on-chip instruction and data 
caches to select a cache set. The remaining physical address bits are then compared with 
the tag fields (comprised of bits PA[O—19]) of the two selected cache blocks to determine if 
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a cache hit has occurred. In the case of a cache miss on the MPC750, the instruction or data 
access is then forwarded to the L2 interface tags to check for an L2 cache hit. In case of a 
miss (and in all cases of an on-chip cache miss on the MPC740) the access is forwarded to 
the bus interface unit which initiates an external memory access. 
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Figure 5-1. MMU Conceptual Block Diagram—32-Bit Implementations 
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Figure 5-2. MPC750 Microprocessor IMMU Block Diagram 
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Figure 5-3. MPC750 Microprocessor DMMU Block Diagram 


5.1.3 Address Translation Mechanisms 
Processors that implement the PowerPC architecture support the following three types of 
address translation: 

e Page address translation—translates the page frame address for a 4-Kbyte page size 


¢ Block address translation—translates the block number for blocks that range in size 
from 128 Kbytes to 256 Mbytes. 
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¢ Real addressing mode address translation—when address translation is disabled, the 
physical address is identical to the effective address. 


Figure 5-4 shows the three address translation mechanisms provided by the MMUs. The 
segment descriptors shown in the figure control the page address translation mechanism. 
When an access uses page address translation, the appropriate segment descriptor is 
required. In 32-bit implementations, the appropriate segment descriptor is selected from the 
16 on-chip segment registers by the four highest-order effective address bits. 


A control bit in the corresponding segment descriptor then determines if the access is to 
memory (memory-mapped) or to the direct-store interface space. Note that the direct-store 
interface was present in the architecture only for compatibility with existing I/O devices 
that used this interface. However, it is being removed from the architecture, and the 
MPC750 does not support it. When an access is determined to be to the direct-store 
interface space, the MPC750 takes a DSI exception if it is a data access (see Section 4.5.3, 
“DSI Exception (0x00300)”), and takes an ISI exception if it is an instruction access (see 
Section 4.5.4, “ISI Exception (0x00400)”). 


For memory accesses translated by a segment descriptor, the interim virtual address is 
generated using the information in the segment descriptor. Page address translation 
corresponds to the conversion of this virtual address into the 32-bit physical address used 
by the memory subsystem. In most cases, the physical address for the page resides in an 
on-chip TLB and is available for quick access. However, if the page address translation 
misses in the on-chip TLB, the MMU causes a search of the page tables in memory (using 
the virtual address information and a hashing function) to locate the required physical 
address. 


Because blocks are larger than pages, there are fewer upper-order effective address bits to 
be translated into physical address bits (more low-order address bits (at least 17) are 
untranslated to form the offset into a block) for block address translation. Also, instead of 
segment descriptors and a TLB, block address translations use the on-chip BAT registers as 
a BAT array. If an effective address matches the corresponding field of a BAT register, the 
information in the BAT register is used to generate the physical address; in this case, the 
results of the page translation (occurring in parallel) are ignored. 
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Figure 5-4. Address Translation Types 


When the processor generates an access, and the corresponding address translation enable 
bit in MSR is cleared, the resulting physical address is identical to the effective address and 
all other translation mechanisms are ignored. Instruction address translation and data 
address translation are enabled by setting MSR[IR] and MSR[DR], respectively. 


5.1.4 Memory Protection Facilities 


In addition to the translation of effective addresses to physical addresses, the MMUs 
provide access protection of supervisor areas from user access and can designate areas of 
memory as read-only as well as no-execute or guarded. Table 5-2 shows the protection 
options supported by the MMUs for pages. 
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Table 5-2. Access Protection Options for Pages 















































User Read Supervisor Read Sinarisnk 

Option p—______) User Write —_________JJ ie 
I-Fetch Data I-Fetch Data 
Supervisor-only — = = V | V 
Supervisor-only-no-execute — — = — V V 
Supervisor-write-only V V = V NI V 
Supervisor-write-only-no-execute — V — = V V 
Both (user/supervisor) V V V V V V 
Both (user-/supervisor) no-execute — V | = V | 
Both (user-/supervisor) read-only V V _ J \ _ 
Both (user/supervisor) = V _ = V = 
read-only-no-execute 











V Access permitted 
— Protection violation 


The no-execute option provided in the segment register lets the operating system program 
determine whether instructions can be fetched from an area of memory. The remaining 
options are enforced based on a combination of information in the segment descriptor and 
the page table entry. Thus, the supervisor-only option allows only read and write operations 
generated while the processor is operating in supervisor mode (MSR[PR] = 0) to access the 
page. User accesses that map into a supervisor-only page cause an exception. 


Finally, a facility in the VEA and OEA allows pages or blocks to be designated as guarded, 
preventing out-of-order accesses that may cause undesired side effects. For example, areas 
of the memory map used to control I/O devices can be marked as guarded so accesses do 
not occur unless they are explicitly required by the program. 


For more information on memory protection, see “Memory Protection Facilities,’ in 
Chapter 7, “Memory Management,” in the Programming Environments Manual. 


5.1.5 Page History Information 


The MMUs of these processors also define referenced (R) and changed (C) bits in the page 
address translation mechanism that can be used as history information relevant to the page. 
The operating system can use these bits to determine which areas of memory to write back 
to disk when new pages must be allocated in main memory. While these bits are initially 
programmed by the operating system into the page table, the architecture specifies that they 
can be maintained either by the processor hardware (automatically) or by some 
software-assist mechanism. 


Implementation Note—When loading the TLB, the MPC750 checks the state of the 
changed and referenced bits for the matched PTE. If the referenced bit is not set and the 
table search operation is initially caused by a load operation or by an instruction fetch, the 
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MPC750 automatically sets the referenced bit in the translation table. Similarly, if the table 
search operation is caused by a store operation and either the referenced bit or the changed 
bit is not set, the hardware automatically sets both bits in the translation table. In addition, 
when the address translation of a store operation hits in the DTLB, the MPC750 checks the 
state of the changed bit. If the bit is not already set, the hardware automatically updates the 
DTLB and the translation table in memory to set the changed bit. For more information, see 
Section 5.4.1, “Page History Recording.” 


5.1.6 General Flow of MMU Address Translation 


The following sections describe the general flow used by processors that implement the 
PowerPC architecture to translate effective addresses to virtual and then physical addresses. 


5.1.6.1 Real Addressing Mode and Block Address 
Translation Selection 


When an instruction or data access is generated and the corresponding instruction or data 
translation is disabled (MSR[IR] = 0 or MSR[DR] = 0), real addressing mode is used 
(physical address equals effective address) and the access continues to the memory 
subsystem as described in Section 5.2, “Real Addressing Mode.” 


Figure 5-5 shows the flow the MMUs use in determining whether to select real addressing 
mode, block address translation, or the segment descriptor to select page address 
translation. 


Note that if the BAT array search results in a hit, the access is qualified with the appropriate 
protection bits. If the access violates the protection mechanism, an exception (ISI or DSI 
exception) is generated. 
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Figure 5-5. General Flow of Address Translation (Real Addressing Mode and Block) 


5.1.6.2 Page Address Translation Selection 


If address translation is enabled and the effective address information does not match a BAT 
array entry, the segment descriptor must be located. When the segment descriptor is located, 
the T bit in the segment descriptor selects whether the translation is to a page or to a 
direct-store segment as shown in Figure 5-6. For 32-bit implementations, the segment 
descriptor for an access is contained in one of 16 on-chip segment registers; effective 
address bits EA[0—3] select one of the 16 segment registers. 


Note that the MPC750 does not implement the direct-store interface, and accesses to these 
segments cause a DSI or ISI exception. In addition, Figure 5-6 also shows the way in which 
the no-execute protection is enforced; if the N bit in the segment descriptor is set and the 
access is an instruction fetch, the access is faulted as described in Chapter 7, “Memory 
Management,” in the Programming Environments Manual. Note that the figure shows the 
flow for these cases as described by the PowerPC OFA, and so the TLB references are 
shown as optional. Because the MPC750 implements TLBs, these branches are valid and 
are described in more detail throughout this chapter. 
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Figure 5-6. General Flow of Page and Direct-Store Interface Address Translation 


If SR[T] = 0, page address translation is selected. The information in the segment descriptor 
is then used to generate the 52-bit virtual address. The virtual address is then used to 
identify the page address translation information (stored as page table entries (PTEs) in a 
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page table in memory). For increased performance, the MPC750 has two on-chip TLBs to 
cache recently-used translations on-chip. 


If an access hits in the appropriate TLB, page translation succeeds and the physical address 
bits are forwarded to the memory subsystem. If the required translation is not resident, the 
MMU performs a search of the page table. If the required PTE is found, a TLB entry is 
allocated and the page translation is attempted again. This time, the TLB is guaranteed to 
hit. When the translation is located, the access is qualified with the appropriate protection 
bits. If the access causes a protection violation, either an ISI or DSI exception is generated. 


If the PTE is not found by the table search operation, a page fault condition exists, and an 
ISI or DSI exception occurs so software can handle the page fault. 


5.1.7 MMU Exceptions Summary 


To complete any memory access, the effective address must be translated to a physical 
address. As specified by the architecture, an MMU exception condition occurs if this 
translation fails for one of the following reasons: 


e Page fault—there is no valid entry in the page table for the page specified by the 
effective address (and segment descriptor) and there is no valid BAT translation. 


e An address translation is found but the access is not allowed by the memory 
protection mechanism. 


The translation exception conditions defined by the OEA for 32-bit implementations cause 
either the ISI or the DSI exception to be taken as shown in Table 5-3. 


The state saved by the processor for each of these exceptions contains information that 
identifies the address of the failing instruction. Refer to Chapter 4, “Exceptions,” for a more 
detailed description of exception processing. 


Table 5-3. Translation Exception Conditions 




















Condition Description Exception 
Page fault (no PTE found) No matching PTE found in page tables (and no || access: ISI exception 
matching BAT array entry) SRR1[1] = 1 
D access: DSI exception 
DSISR[1] =1 
Block protection violation Conditions described for block in “Block Memory || access: ISI exception 
Protection” in Chapter 7, “Memory SRR1[4] = 1 
Management,’ in the Programming Environments ; ; 
Mania D access: DSI exception 
DSISR[4] =1 
Page protection violation Conditions described for page in “Page Memory || access: ISI exception 
Protection” in Chapter 7, “Memory SRR1[4] = 1 
Management,’ in the Programming Environments : : 
Manual. D access: DSI exception 


DSISRI4] =1 
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Table 5-3. Translation Exception Conditions (continued) 




















Condition Description Exception 
No-execute protection violation | Attempt to fetch instruction when SR[N] = 1 ISI exception 
SRR1[3] = 1 
Instruction fetch from direct-store | Attempt to fetch instruction when SR[T] = 1 ISI exception 
segment SRR1[3] =1 
Data access to direct-store Attempt to perform load or store (including FP |DSI exception 
segment (including oating-point | load or store) when SR[T] = 1 DSISR[5] =1 
accesses) 
Instruction fetch from guarded Attempt to fetch instruction when MSR[IR] = 1 | ISI exception 
memory and either matching xBAT[G] = 1, or no SRR1[3] =1 
matching BAT entry and PTE[G] = 1 








In addition to the translation exceptions, there are other MMU-related conditions (some of 
them defined as implementation-specific, and therefore not required by the architecture) 
that can cause an exception to occur. These exception conditions map to processor 
exceptions as shown in Table 5-4. The only MMU exception conditions that occur when 
MSR[DR] = 0 are those that cause an alignment exception for data accesses. For more 
detailed information about the conditions that cause an alignment exception (in particular 
for string/multiple instructions), see Section 4.5.6, “Alignment Exception (0x00600).” 


Note that some exception conditions depend upon whether the memory area is set up as 
write-though (W = 1) or cache-inhibited (I = 1). These bits are described fully in 
“Memory/Cache Access Attributes,” in Chapter 5, “Cache Model and Memory Coherency,” 
of the Programming Environments Manual. Refer to Chapter 4, “Exceptions,” and to 
Chapter 6, “Exceptions,” in the Programming Environments Manual for a complete 
description of the SRR1 and DSISR bit settings for these exceptions. 


The LSU initiates out-of-order accesses without knowledge of whether it is legal to do so. 
However, the MMU does not perform a hardware table search due to TLB misses until the 
request is required by the program flow. In these out-of-order cases, the MMU detects 
protection violations and whether a dcbz instruction specifies a page marked as 
write-through or cache-inhibited. The MMU also detects alignment exceptions caused by 
the dcbz instruction and prevents the changed bit in the PTE from being updated 
erroneously in these cases. 


Table 5-4. Other MMU Exception Conditions for the MPC750 Processor 














Condition Description Exception 
dcbz with W = 1 orl = 1 dcbz instruction to write-through or Alignment exception (not 
cache-inhibited segment or block required by architecture for 
this condition) 
lwarx, stwex., eciwx, or ecowx Reservation instruction or external control DSI exception 
instruction to direct-store segment | instruction when SR[T] =1 DSISR[5] =1 
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Table 5-4. Other MMU Exception Conditions for the MPC750 Processor 








Condition Description Exception 
Floating-point load or store to FP memory access when SR[T] =1 See data access to 
direct-store segment direct-store segment in 

Table 5-3. 
Load or store that results in a Does not occur in MPC750 Does not apply 


direct-store error 








eciwx or ecowx attempted when eciwx or ecowx attempted with EAR[E] = 0 DSI exception 

external control facility disabled DSISR[11] = 1 

Imw, stmw, Iswi, lswx, stswi, or Imw, stmw, Iswi, lswx, stswi, or stswx Alignment exception 

stswx instruction attempted in instruction attempted while MSR[LE] = 1 

little-endian mode 

Operand misalignment Translation enabled and a oating-point Alignment exception (some of 
load/store, stmw, stwex., Imw, lwarx, eciwx, | these cases are 
or ecowx instruction operand is not implementation-speci c) 


word-aligned 

















5.1.8 MMU Instructions and Register Summary 


The MMU instructions and registers allow the operating system to set up the block address 
translation areas and the page tables in memory. 


Note that because the implementation of TLBs is optional, the instructions that refer to 
these structures are also optional. However, as these structures serve as caches of the page 
table, the architecture specifies a software protocol for maintaining coherency between 
these caches and the tables in memory whenever the tables in memory are modified. When 
the tables in memory are changed, the operating system purges these caches of the 
corresponding entries, allowing the translation caching mechanism to refetch from the 
tables when the corresponding entries are required. 


Note that the MPC750 implements all TLB-related instructions except tlbia, which is 
treated as an illegal instruction. 


Because the MMU specification for these processors is so flexible, it is recommended that 
the software that uses these instructions and registers be encapsulated into subroutines to 
minimize the impact of migrating across the family of implementations. 


Table 5-5 summarizes MPC750 instructions that specifically control the MMU. For more 
detailed information about the instructions, refer to Chapter 2, “Programming Model,” in 
this book and Chapter 8, “Instruction Set,’ in the Programming Environments Manual 
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Table 5-5. MPC750 Microprocessor Instruction Summary—Control MMUs 





Instruction Description 


mtsr SR,rS Move to Segment Register 
SR[SR#]<— rS 





mtsrinrS,rB_ | Move to Segment Register Indirect 
SR[rB[0-3]]<-rS 





mfsrrD,SR_ | Move from Segment Register 
rD<—SR[SR#] 





mfsrin rD,rB | Move from Segment Register Indirect 
rD<—SR[rB[0-3]] 





tlbie rB* TLB Invalidate Entry 

For effective address speci ed b y rB, TLB[V]<—0 

The tlbie instruction invalidates all TLB entries indexed by the EA, and operates on both the 
instruction and data TLBs simultaneously invalidating four TLB entries. The index corresponds to 
bits 14-19 of the EA. 

Software must ensure that instruction fetches or memory references to the virtual pages speci ed 
by the tlbie instruction have been completed prior to executing the tlbie instruction. 





tlbsync* TLB Synchronize 

Synchronizes the execution of all other tlbie instructions in the system. In the MPC750, when the 
TLBISYNC signal is negated, instruction execution may continue or resume after the completion of 
a tlbsync instruction. When the TLBISYNC signal is asserted, instruction execution stops after the 
completion of a tlbsync instruction. See Section 8.8.2, “TLBISYNC Input” for more information. 














*These instructions are de ned b y the PowerPC architecture, but are optional. 


Table 5-6 summarizes the registers that the operating system uses to program the MPC750 
MMUs. These registers are accessible to supervisor-level software only. These registers are 
described in Chapter 2, “Programming Model.” 


Table 5-6. MPC750 Microprocessor MMU Registers 








Register Description 
Segment registers The sixteen 32-bit segment registers are present only in 32-bit implementations of the 
(SRO-SR15) PowerPC architecture. The elds in the segment register are inter preted differently 


depending on the value of bit 0. The segment registers are accessed by the mtsr, 
mtsrin, mfsr, and mfsrin instructions. 








BAT registers There are 16 BAT registers, organized as four pairs of instruction BAT registers 
(IBATOU-IBAT3U, (IBATOU-IBAT3U paired with IBATOL—IBAT3L) and four pairs of data BAT registers 
IBATOL-IBAT3L, (DBATOU-—DBAT3U paired with DBATOL—DBAT3L). The BAT registers are de ned as 
DBATOU-DBAT3U, and 32-bit registers in 32-bit implementations. These are special-purpose registers that are 
DBATOL—DBAT3L) accessed by the mtspr and mfspr instructions. 
SDR1 The SDR1 register speci es the v ariables used in accessing the page tables in 


memory. SDR1 is de ned as a 32-bit register f or 32-bit implementations. This 
special-purpose register is accessed by the mtspr and mfspr instructions. 














If an MMU register is being accessed by an instruction in the instruction stream, the IMMU 
stalls for one translation cycle to perform that operation. The sequencer serializes 
instructions to ensure the data correctness. For updating the IBATs and SRs, the sequencer 
classifies those operations as fetch serializing. After such an instruction is dispatched, the 
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instruction buffer is flushed and the fetch stalls until the instruction completes. However, 
for reading from the IBATs, the operation is classified as execution serializing. As long as 
the LSU ensures that all previous instructions can be executed, subsequent instructions can 
be fetched and dispatched. 


5.2 Real Addressing Mode 


If address translation is disabled (MSR[IR] = 0 or MSR[DR] = 0) for a particular access, 
the effective address is treated as the physical address and is passed directly to the memory 
subsystem as described in Chapter 7, “Memory Management,’ in the Programming 
Environments Manual. 


Note that the default WIMG bits (Ob0011) cause data accesses to be considered cacheable 
(I = 0) and thus load and store accesses are weakly ordered. This is the case even if the data 
cache is disabled in the HIDO register (as it is out of hard reset). If I/O devices require load 
and store accesses to occur in strict program order (strongly ordered), translation must be 
enabled so that the corresponding I bit can be set. Note also, that the G bit must be set to 
ensure that the accesses are strongly ordered. For instruction accesses, the default memory 
access mode bits (WIMG) are also 0b0011. That is, instruction accesses are considered 
cacheable (I = 0), and the memory is guarded. Again, instruction accesses are considered 
cacheable even if the instruction cache is disabled in the HIDO register (as it is out of hard 
reset). The W and M bits have no effect on the instruction cache. 


For information on the synchronization requirements for changes to MSR[IR] and 
MSR[DR], refer to Section 2.3.2.4, ‘Synchronization,’ in this manual, and 
“Synchronization Requirements for Special Registers and for Lookaside Buffers” in 
Chapter 2, “PowerPC Register Set,” in the Programming Environments Manual. 


5.3. Block Address Translation 


The block address translation (BAT) mechanism in the OEA provides a way to map ranges 
of effective addresses larger than a single page into contiguous areas of physical memory. 
Such areas can be used for data that is not subject to normal virtual memory handling 
(paging), such as a memory-mapped display buffer or an extremely large array of numerical 
data. 


Block address translation in the MPC750 is described in Chapter 7, “Memory 
Management,” in the Programming Environments Manual for 32-bit implementations. 


Implementation Note— The MPC750 BAT registers are not initialized by the hardware 
after the power-up or reset sequence. Consequently, all valid bits in both instruction and 
data BAT areas must be explicitly cleared before setting any BAT area for the first time and 
before enabling translation. Also, note that software must avoid overlapping blocks while 
updating a BAT area or areas. Even if translation is disabled, multiple BAT area hits (with 
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the valid bits set) can corrupt the remaining portion (any bits except the valid bits) of the 
BAT registers. 


Thus, multiple BAT hits (with valid bits set) are considered a programming error whether 
translation is enabled or disabled, and can lead to unpredictable results if translation is 
enabled, (or if translation is disabled, when translation is eventually enabled). For the case 
of unused BATs (if translation is to be enabled) it is sufficient precaution to simply clear the 
valid bits of the unused BAT entries. 


5.4 Memory Segment Model 


The MPC750 adheres to the memory segment model as defined in Chapter 7, “Memory 
Management,” in the Programming Environments Manual for 32-bit implementations. 
Memory in the PowerPC OEA is divided into 256-Mbyte segments. This segmented 
memory model provides a way to map 4-Kbyte pages of effective addresses to 4-Kbyte 
pages in physical memory (page address translation), while providing the programming 
flexibility afforded by a large virtual address space (52 bits). 


The segment/page address translation mechanism may be superseded by the block address 
translation (BAT) mechanism described in Section 5.3, “Block Address Translation.’ If not, 
the translation proceeds in the following two steps: 


1. from effective address to the virtual address (which never exists as a specific entity 
but can be considered to be the concatenation of the virtual page number and the byte 
offset within a page), and 


2. from virtual address to physical address. 


This section highlights those areas of the memory segment model defined by the OEA that 
are specific to the MPC750. 


5.4.1. Page History Recording 


Referenced (R) and changed (C) bits in each PTE keep history information about the page. 
They are maintained by a combination of the MPC750 table search hardware and the 
system software. The operating system uses this information to determine which areas of 
memory to write back to disk when new pages must be allocated in main memory. 
Referenced and changed recording is performed only for accesses made with page address 
translation and not for translations made with the BAT mechanism or for accesses that 
correspond to direct-store (T = 1) segments. Furthermore, R and C bits are maintained only 
for accesses made while address translation is enabled (MSR[IR] = 1 or MSR[DR] = 1). 


In the MPC750, the referenced and changed bits are updated as follows: 
¢ For TLB hits, the C bit is updated according to Table 5-7. 
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¢ For TLB misses, when a table search operation is in progress to locate a PTE. The 
R and C bits are updated (set, if required) to reflect the status of the page based on 
this access. 


Table 5-7. Table Search Operations to Update History Bits—TLB Hit Case 








i TLBEN a Processor Action 
00 Combination doesn’t occur 
01 Combination doesn’t occur 
10 Read: No special action 
Write: The MPC750 initiates a table search operation to update C. 
11 No special action for read or write 














The table shows that the status of the C bit in the TLB entry (in the case of a TLB hit) is 
what causes the processor to update the C bit in the PTE (the R bit is assumed to be set in 
the page tables if there is a TLB hit). Therefore, when software clears the R and C bits in 
the page tables in memory, it must invalidate the TLB entries associated with the pages 
whose referenced and changed bits were cleared. 


The debt and debtst instructions can execute if there is a TLB/BAT hit or if the processor 
is in real addressing mode. In case of a TLB or BAT miss, these instructions are treated as 
no-ops; they do not initiate a table search operation and they do not set either the R or C bits. 


As defined by the PowerPC architecture, the referenced and changed bits are updated as if 
address translation were disabled (real addressing mode). If these update accesses hit in the 
data cache, they are not seen on the external bus. If they miss in the data cache, they are 
performed as typical cache line fill accesses on the bus (assuming the data cache is 
enabled). 


5.4.1.1. Referenced Bit 


The referenced (R) bit of a page is located in the PTE in the page table. Every time a page 
is referenced (with a read or write access) and the R bit is zero, the MPC750 sets the R bit 
in the page table. The OFA specifies that the referenced bit may be set immediately, or the 
setting may be delayed until the memory access is determined to be successful. Because the 
reference to a page is what causes a PTE to be loaded into the TLB, the referenced bit in all 
MPC750 TLB entries is effectively always set. The processor never automatically clears the 
referenced bit. 


The referenced bit is only a hint to the operating system about the activity of a page. At 
times, the referenced bit may be set although the access was not logically required by the 
program or even if the access was prevented by memory protection. Examples of this 
include the following: 
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e Fetching of instructions not subsequently executed 


e A memory reference caused by a speculatively executed instruction that is 
mispredicted 


¢ Accesses generated by an Iswx or stswx instruction with a zero length 


e Accesses generated by an stwex. instruction when no store is performed because a 
reservation does not exist 


e Accesses that cause exceptions and are not completed 


5.4.1.2 Changed Bit 


The changed bit of a page is located both in the PTE in the page table and in the copy of the 
PTE loaded into the TLB (if a TLB is implemented, as in the MPC750). Whenever a data 
store instruction is executed successfully, if the TLB search (for page address translation) 
results in a hit, the changed bit in the matching TLB entry is checked. If it is already set, it 
is not updated. If the TLB changed bit is 0, the MPC750 initiates the table search operation 
to set the C bit in the corresponding PTE in the page table. The MPC750 then reloads the 
TLB (with the C bit set). 


The changed bit (in both the TLB and the PTE in the page tables) is set only when a store 
operation is allowed by the page memory protection mechanism and the store is guaranteed 
to be in the execution path (unless an exception, other than those caused by the sc, rfi, or 
trap instructions, occurs). Furthermore, the following conditions may cause the C bit to be 
set: 


e The execution of an stwex. instruction is allowed by the memory protection 
mechanism but a store operation is not performed. 


e The execution of an stswx instruction is allowed by the memory protection 
mechanism but a store operation is not performed because the specified length is 
Zero. 


e The store operation is not performed because an exception occurs before the store is 
performed. 


Again, note that although the execution of the debt and dcbtst instructions may cause the 
R bit to be set, they never cause the C bit to be set. 


5.4.1.3. Scenarios for Referenced and Changed Bit Recording 


This section provides a summary of the model (defined by the OEA) that is used by the 
processors for maintaining the referenced and changed bits. In some scenarios, the bits are 
guaranteed to be set by the processor, in some scenarios, the architecture allows that the bits 
may be set (not absolutely required), and in some scenarios, the bits are guaranteed to not 
be set. Note that when the MPC750 updates the R and C bits in memory, the accesses are 
performed as if MSR[DR] = 0 and G = 0 (that is, as nonguarded cacheable operations in 
which coherency is required). 


Chapter 5. Memory Management 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Memory Segment Model 


Table 5-8 defines a prioritized list of the R and C bit settings for all scenarios. The entries 
in the table are prioritized from top to bottom, such that a matching scenario occurring 
closer to the top of the table takes precedence over a matching scenario closer to the bottom 
of the table. For example, if an stwex. instruction causes a protection violation and there is 
no reservation, the C bit is not altered, as shown for the protection violation case. Note that 
in the table, load operations include those generated by load instructions, by the eciwx 
instruction, and by the cache management instructions that are treated as a load with respect 
to address translation. Similarly, store operations include those operations generated by 
store instructions, by the ecowx instruction, and by the cache management instructions that 
are treated as a store with respect to address translation. 


Table 5-8. Model for Guaranteed R and C Bit Settings 




































































Causes Setting of R Bit | Causes Setting of C Bit 
Priority Scenario 
OEA MPC750 OEA MPC750 

1 No-execute protection violation No No No No 
2 Page protection violation Maybe Yes No No 
3 Out-of-order instruction fetch or load operation Maybe No No No 
4 Out-of-order store operation. Would be required by Maybe" No No No 

the sequential execution model in the absence of 

system-caused or imprecise exceptions, or of 

oating-point assist e xception for instructions that 

would cause no other kind of precise exception. 
5 All other out-of-order store operations Maybe" No Maybe’ No 
6 Zero-length load (Iswx) Maybe No No No 
7 Zero-length store (stswx) Maybe’ No Maybe’ No 
8 Store conditional (stwex.) that does not store Maybe’ Yes Maybe’ Yes 
9 In-order instruction fetch Yes? Yes No No 
10 Load instruction or eciwx Yes Yes No No 
11 Store instruction, ecowx or dcbz instruction Yes Yes Yes Yes 
12 icbi, dcbt, or dcbtst instruction Maybe No No No 
13 dcbst or debf instruction Maybe Yes No No 
14 | debi instruction Maybe’ Yes Maybe’ Yes 

Notes: 


1 If Cis set, Ris guaranteed to be set also. 
2 Includes the case in which the instruction is fetched out of order and R is not set (does not apply for MPC750). 


For more information, see “Page History Recording” in Chapter7, “Memory 
Management,” of the Programming Environments Manual. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 





Freescale Semiconductor, Inc. 
Memory Segment Model 


5.4.2 Page Memory Protection 


The MPC750 implements page memory protection as it is defined in Chapter 7, “Memory 
Management,” in the Programming Environments Manual. 


5.4.3. TLB Description 


The MPC750 implements separate 128-entry data and instruction TLBs to maximize 
performance. This section describes the hardware resources provided in the MPC750 to 
facilitate page address translation. Note that the hardware implementation of the MMU is 
not specified by the architecture, and while this description applies to the MPC750, it does 
not necessarily apply to other processors of this family. 


5.4.3.1. TLB Organization 


Because the MPC750 has two MMUs (IMMU and DMMU) that operate in parallel, some 
of the MMU resources are shared, and some are actually duplicated (shadowed) in each 
MMU to maximize performance. For example, although the architecture defines a single 
set of segment registers for the MMU, the MPC750 maintains two identical sets of segment 
registers, one for the IMMU and one for the DMMU; when an instruction that updates the 
segment register executes, the MPC750 automatically updates both sets. 


Each TLB contains 128 entries organized as a two-way set-associative array with 64 sets as 
shown in Figure 5-7 for the DTLB (the ITLB organization is the same). When an address 
is being translated, a set of two TLB entries is indexed in parallel with the access to a 
segment register. If the address in one of the two TLB entries is valid and matches the 40-bit 
virtual page number, that TLB entry contains the translation. If no match is found, a TLB 
miss occurs. 


Unless the access is the result of an out-of-order access, a hardware table search operation 
begins if there is a TLB miss. If the access is out of order, the table search operation is 
postponed until the access is required, at which point the access is no longer out of order. 
When the matching PTE is found in memory, it is loaded into the TLB entry selected by the 
least-recently-used (LRU) replacement algorithm, and the translation process begins again, 
this time with a TLB hit. 
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> PA[O-19] 
Figure 5-7. Segment Register and DTLB Organization 


The TLB entries are on-chip copies of PTEs in the page tables in memory and are similar 
in structure. To uniquely identify a TLB entry as the required PTE, the TLB entry also 
contains four more bits of the page index, EA[10—13] (in addition to the API bits in the 
PTE). 


Software cannot access the TLB arrays directly, except to invalidate an entry with the tlbie 
instruction. 


Each set of TLB entries has one associated LRU bit. The LRU bit for a set is updated any 
time either entry is used, even if the access is speculative. Invalid entries are always the first 
to be replaced. 


Although both MMUs can be accessed simultaneously (both sets of segment registers and 
TLBs can be accessed in the same clock), only one exception condition can be reported at 
atime. ITLB miss exception conditions are reported when there are no more instructions to 
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be dispatched or retired (the pipeline is empty). Refer to Chapter 6, “Instruction Timing,” 
for more detailed information about the internal pipelines and the reporting of exceptions. 


When an instruction or data access occurs, the effective address is routed to the appropriate 
MMU. EA[0—3] select one of the 16 segment registers and the remaining effective address 
bits and the VSID field from the segment register is passed to the TLB. EA[14—19] then 
select two entries in the TLB; the valid bits are checked and the 40-bit virtual page number 
(24-bit VSID and EA[4—-19]) must match the VSID, EAPI, and API fields of the TLB 
entries. If one of the entries hits, the PP bits are checked for a protection violation. If these 
bits don’t cause an exception, the C bit is checked and a table search operation is initiated 
if C must be updated. If C does not require updating, the RPN value is passed to the memory 
subsystem and the WIMG bits are then used as attributes for the access. 


Although address translation is disabled on a reset condition, the valid bits of TLB entries 
are not automatically cleared. Thus, TLB entries must be explicitly cleared by the system 
software (with the tlbie instruction) before address translation is enabled. Also, note that 
the segment registers do not have a valid bit, and so they should also be initialized before 
translation is enabled. 


5.4.3.2 TLB Invalidation 


The MPC750 implements the optional tlbie and tlbsync instructions, which are used to 
invalidate TLB entries. The execution of the tlbie instruction always invalidates four 
entries—both the ITLB and DTLB entries indexed by EA[14—19]. 


The architecture allows tlbie to optionally enable a TLB invalidate signaling mechanism in 
hardware so that other processors also invalidate their resident copies of the matching PTE. 
The MPC750 does not signal the TLB invalidation to other processors nor does it perform 
any action when a TLB invalidation is performed by another processor. 


The tlbsync instruction causes instruction execution to stop if the TLBISYNC signal is 
asserted. If TLBIS YNC is negated, instruction execution may continue or resume after the 
completion of a tlbsync instruction. Section 8.8.2, “TLBISYNC Input,” describes the TLB 
synchronization mechanism in further detail. 


The tlbia instruction is not implemented on the MPC750 and when its opcode is 
encountered, an illegal instruction program exception is generated. To invalidate all entries 
of both TLBs, 64 tlbie instructions must be executed, incrementing the value in 
EA14—EA19 by one each time. See Chapter 8, “Instruction Set,’ in the Programming 
Environments Manual for detailed information about the tlbie instruction. 


Software must ensure that instruction fetches or memory references to the virtual pages 
specified by the tlbie have been completed prior to executing the tlbie instruction. 


Other than the possible TLB miss on the next instruction prefetch, the tlbie instruction does 
not affect the instruction fetch operation—that is, the prefetch buffer is not purged and does 
not cause these instructions to be refetched. 
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5.4.4 Page Address Translation Summary 
Figure 5-8 provides the detailed flow for the page address translation mechanism. 


The figure includes the checking of the N bit in the segment descriptor and then expands 
on the “TLB Hit’ branch of Figure 5-6. The detailed flow for the “TLB Miss’ branch of 
Figure 5-6 is described in Section 5.4.5, “Page Table Search Operation.” Note that as in the 
case of block address translation, if an attempt is made to execute a dcebz instruction to a 
page marked either write-through or caching-inhibited (W = 1 or I=1), an alignment 
exception is generated. The checking of memory protection violation conditions is 
described in Chapter 7, “Memory Management,” in the Programming Environments 
Manual. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 


Effective Address 
Generated 





debz instruction 
with W orl = 1 





Alignment Exception 


Store access with 
PTE [C] =0 


Page Table 
Search Operation 


(See Figure 5-9) 








Page address 
translation 


Generate 52-Bit Virtual Address 
from Segment Descriptor 


Compare Virtual Address 
with TLB Entries 





TLB hit case 


Access permitted 


Bi, Page Memory 
. Protection Violation 
Otherwise 


Memory Segment Model 


(See Figure 5-6) 





Instruction fetch with N-bit 
set in segment descriptor 
(No-execute) 


Otherwise 











Check Page Memory 
Protection Violation Conditions 


(See the Programming 


Environments Manual) (See the 


Programming 
Environments 


=: Manual) 
Access prohibited 





“Seo 


PA[0-31]<-RPNIIA[20-31] 


Continue Access to Memory Sub- 
system with WIMG Bits from PTE 





Figure 5-8. Page Address Translation Flow—TLB Hit 


5.4.5 Page Table Search Operation 


If the translation is not found in the TLBs (a TLB miss), the MPC750 initiates a table search 
operation which is described in this section. Formats for the PTE are given in “PTE Format 
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for 32-Bit Implementations,” in Chapter 7, “Memory Management,” of the Programming 
Environments Manual. 


The following is a summary of the page table search process performed by the MPC750: 


I; 


The 32-bit physical address of the primary PTEG is generated as described in “Page 
Table Addresses” in Chapter 7, “Memory Management,” of the Programming 
Environments Manual. 


. The first PTE (PTEO) in the primary PTEG is read from memory. PTE reads occur 


with an implied WIM memory/cache mode control bit setting of Ob001. Therefore, 
they are considered cacheable and read (burst) from memory and placed in the 
cache. 


. The PTE in the selected PTEG is tested for a match with the virtual page number 


(VPN) of the access. The VPN is the VSID concatenated with the page index field 
of the virtual address. For a match to occur, the following must be true: 


— PTE[H] =0 
— PTE[V]=1 
— PTE[VSID] = VA[0-23] 
— PTE[API] = VA[24—29] 


. Ifa match is not found, step 3 is repeated for each of the other seven PTEs in the 


primary PTEG. If a match is found, the table search process continues as described 
in step 8. If a match is not found within the 8 PTEs of the primary PTEG, the 
address of the secondary PTEG is generated. 


. The first PTE (PTEO) in the secondary PTEG is read from memory. Again, because 


PTE reads have a WIM bit combination of Ob001, an entire cache line is read into 
the on-chip cache. 


. The PTE in the selected secondary PTEG 1s tested for a match with the virtual page 


number (VPN) of the access. For a match to occur, the following must be true: 
— PTE[H]=1 

— PTE[V]=1 

— PTE[VSID] = VA[0-23] 

— PTE[API] = VA[24—29] 


. Ifa match is not found, step 6 is repeated for each of the other seven PTEs in the 


secondary PTEG. If it is never found, an exception is taken (step 9). 


. Ifa match is found, the PTE is written into the on-chip TLB and the R bit is 


updated in the PTE in memory (if necessary). If there is no memory protection 
violation, the C bit is also updated in memory (if the access is a write operation) 
and the table search is complete. 
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9. Ifa match is not found within the 8 PTEs of the secondary PTEG, the search fails, 
and a page fault exception condition occurs (either an ISI exception or a DSI 
exception). 


Figure 5-9 and Figure 5-10 show how the conceptual model for the primary and secondary 
page table search operations, described in the Programming Environments Manual, are 
realized in the MPC750. 


Figure 5-9 shows the case of a debz instruction that is executed with W = | or I = 1, and 
that the R bit may be updated in memory (if required) before the operation is performed or 
the alignment exception occurs. The R bit may also be updated if memory protection is 
violated. 
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5.4.6 Page Table Updates 


When TLBs are implemented (as in the MPC750) they are defined as noncoherent caches 
of the page tables. TLB entries must be flushed explicitly with the TLB invalidate entry 
instruction (tlbie) whenever the corresponding PTE is modified. As the MPC750 is 
intended primarily for uniprocessor environments, it does not provide coherency of TLBs 
between multiple processors. If the MPC750 is used in a multiprocessor environment 
where TLB coherency is required, all synchronization must be implemented in software. 


Processors may write referenced and changed bits with unsynchronized, atomic byte store 
operations. Note that the V, R, and C bits each reside in a distinct byte of a PTE. Therefore, 
extreme care must be taken to use byte writes when updating only one of these bits. 


Explicitly altering certain MSR bits (using the mtmsr instruction), or explicitly altering 
PTEs, or certain system registers, may have the side effect of changing the effective or 
physical addresses from which the current instruction stream is being fetched. This kind of 
side effect is defined as an implicit branch. Implicit branches are not supported and an 
attempt to perform one causes boundedly-undefined results. Therefore, PTEs must not be 
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changed in a manner that causes an implicit branch. Chapter 2, “PowerPC Register Set,” in 
the Programming Environments Manual, lists the possible implicit branch conditions that 
can occur when system registers and MSR bits are changed. 


5.4.7 Segment Register Updates 


Synchronization requirements for using the move to segment register instructions are 
described in “Synchronization Requirements for Special Registers and for Lookaside 
Buffers” in Chapter 2, “PowerPC Register Set,” in the Programming Environments Manual. 
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Chapter 6 
Instruction Timing 


This chapter describes how the MPC750 microprocessor fetches, dispatches, and executes 
instructions and how it reports the results of instruction execution. It gives detailed 
descriptions of how the MPC750 execution units work, and how those units interact with 
other parts of the processor, such as the instruction fetching mechanism, register files, and 
caches. It gives examples of instruction sequences, showing potential bottlenecks and how 
to minimize their effects. Finally, it includes tables that identify the unit that executes each 
instruction implemented on the MPC750, the latency for each instruction, and other 
information that is useful for the assembly language programmer. 


Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions 
for the MPC750 apply for the MPC755 except as noted in Appendix C, “MPC755 
Embedded G3 Microprocessor.” 


6.1. Terminology and Conventions 


This section provides an alphabetical glossary of terms used in this chapter. These 
definitions are provided as a review of commonly used terms and as a way to point out 
specific ways these terms are used in this chapter. 


¢ Branch prediction—The process of guessing whether a branch will be taken. Such 
predictions can be correct or incorrect; the term ‘predicted’ as it is used here does 
not imply that the prediction is correct (successful). The PowerPC architecture 
defines a means for static branch prediction as part of the instruction encoding. 


¢ Branch resolution—The determination of whether a branch is taken or not taken. A 
branch is said to be resolved when the processor can determine which instruction 
path to take. If the branch is resolved as predicted, the instructions following the 
predicted branch that may have been speculatively executed can complete (see 
completion). If the branch is not resolved as predicted, instructions on the 
mispredicted path, and any results of speculative execution, are purged from the 
pipeline and fetching continues from the nonpredicted path. 

¢ Completion—Completion occurs when an instruction has finished executing, 
written back any results, and is removed from the completion queue. When an 
instruction completes, it is guaranteed that this instruction and all previous 
instructions can cause no exceptions. 
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¢ Fall-through (branch fall-through)—A not-taken branch. On the MPC750, 
fall-through branch instructions are removed from the instruction stream at dispatch. 
That is, these instructions are allowed to fall through the instruction queue via the 
dispatch mechanism, without either being passed to an execution unit and or given 
a position in the completion queue. 


¢ Fetch—The process of bringing instructions from memory (such as a cache or 
system memory) into the instruction queue. 


¢ Folding (branch folding)—The replacement with target instructions of a branch 
instruction and any instructions along the not-taken path when a branch is either 
taken or predicted as taken. 


e Finish—Finishing occurs in the last cycle of execution. In this cycle, the completion 
queue entry is updated to indicate that the instruction has finished executing. 


e Latency— The number of clock cycles necessary to execute an instruction and make 
ready the results of that execution for a subsequent instruction. 


¢ Pipeline—In the context of instruction timing, the term ‘pipeline’ refers to the 
interconnection of the stages. The events necessary to process an instruction are 
broken into several cycle-length tasks to allow work to be performed on several 
instructions simultaneously—analogous to an assembly line. As an instruction is 
processed, it passes from one stage to the next. When it does, the stage becomes 
available for the next instruction. 


Although an individual instruction may take many cycles to complete (the number 
of cycles is called instruction latency), pipelining makes it possible to overlap the 
processing so that the throughput (number of instructions completed per cycle) is 
greater than if pipelining were not implemented. 


e Program order—The order of instructions in an executing program. More 
specifically, this term is used to refer to the original order in which program 
instructions are fetched into the instruction queue from the cache. 


¢ Rename register—Temporary buffers used by instructions that have finished 
execution but have not completed. 


¢ Reservation station—A buffer between the dispatch and execute stages that allows 
instructions to be dispatched even though the results of instructions on which the 
dispatched instruction may depend are not available. 


¢ Retirement—Removal of the completed instruction from the completion queue. 


e Stage—The term ‘stage’ is used in two different senses, depending on whether the 
pipeline is being discussed as a physical entity or a sequence of events. In the latter 
case, a Stage is an element in the pipeline during which certain actions are 
performed, such as decoding the instruction, performing an arithmetic operation, or 
writing back the results. A stage is typically described as taking a processor clock 
cycle to perform its operation; however, some events (such as dispatch and 
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write-back) happen instantaneously, and may be thought to occur at the end of the 
stage. 


An instruction can spend multiple cycles in one stage. An integer multiply, for 
example, takes multiple cycles in the execute stage. When this occurs, subsequent 
instructions may stall. 


In some cases, an instruction may also occupy more than one stage simultaneously, 
especially in the sense that a stage can be seen as a physical resource—for example, 
when instructions are dispatched they are assigned a place in the completion queue 
at the same time they are passed to the execute stage. They can be said to occupy 
both the complete and execute stages in the same clock cycle. 


e Stall—An occurrence when an instruction cannot proceed to the next stage. 


e Superscalar—A superscalar processor is one that can issue multiple instructions 
concurrently from a conventional linear instruction stream. In a superscalar 
implementation, multiple instructions can be in the execute stage at the same time. 


e Throughput—A measure of the number of instructions that are processed per cycle. 
For example, a series of double-precision floating-point multiply instructions has a 
throughput of one instruction per clock cycle. 


¢ Write-back—Write-back (in the context of instruction handling) occurs when a 
result is written into the architectural registers (typically the GPRs and FPRs). 
Results are written back at completion time. Results in the write-back buffer cannot 
be flushed. If an exception occurs, these buffers must write back before the 
exception is taken. 


6.2 Instruction Timing Overview 


The MPC750 design minimizes average instruction execution latency, the number of clock 
cycles it takes to fetch, decode, dispatch, and execute instructions and make the results 
available for a subsequent instruction. Some instructions, such as loads and stores, access 
memory and require additional clock cycles between the execute phase and the write-back 
phase. These latencies vary depending on whether the access is to cacheable or 
noncacheable memory, whether it hits in the L1 or L2 cache, whether the cache access 
generates a write-back to memory, whether the access causes a snoop hit from another 
device that generates additional activity, and other conditions that affect memory accesses. 


The MPC750 implements many features to improve throughput, such as pipelining, 
superscalar instruction issue, branch folding, removal of fall-through branches, two-level 
speculative branch handling, and multiple execution units that operate independently and 
in parallel. 


As an instruction passes from stage to stage in a pipelined system, the following instruction 
can follow through the stages as the former instruction vacates them, allowing several 
instructions to be processed simultaneously. While it may take several cycles for an 
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instruction to pass through all the stages, when the pipeline has been filled, one instruction 
can complete its work on every clock cycle. 

Figure 6-1 represents a generic pipelined execution unit. 

Stage 1 Stage 2 Stage 3 


Clock 0 Instruction A 


Clock 1 Instruction B Instruction A 


Clock 2 Instruction C Instruction B Instruction A 


UN 
NOE 
TLL 


Clock 3 Instruction D Instruction C Instruction B 














Figure 6-1. Pipelined Execution Unit 


The entire path that instructions take through the fetch, decode/dispatch, execute, complete, 
and write-back stages is considered the MPC750’s master pipeline, and two of the 
MPC750’s execution units (the FPU and LSU) are also multiple-stage pipelines. 


The MPC750 contains the following execution units that operate independently and in 
parallel: 


e Branch processing unit (BPU) 

e Integer unit 1 ((U1)—executes all integer instructions 

e Integer unit 2 (IU2)—executes all integer instructions except multiplies and divides 

¢ 64-bit floating-point unit (FPU) 

¢ Load/store unit (LSU) 

e System register unit (SRU) 
The MPC750 can retire two instructions on every clock cycle. In general, the MPC750 
processes instructions in four stages—fetch, decode/dispatch, execute, and complete as 


shown in Figure 6-2. Note that the example of a pipelined execution unit in Figure 6-1 is 
similar to the three-stage FPU pipeline in Figure 6-2. 
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Figure 6-2. Superscalar/Pipeline Diagram 


The instruction pipeline stages are described as follows: 


e The instruction fetch stage includes the clock cycles necessary to request 
instructions from the memory system and the time the memory system takes to 
respond to the request. Instruction fetch timing depends on many variables, such as 
whether the instruction is in the branch target instruction cache, the on-chip 
instruction cache, or the L2 cache. Those factors increase when it is necessary to 
fetch instructions from system memory, and include the processor-to-bus clock 
ratio, the amount of bus traffic, and whether any cache coherency operations are 
required. 


Because there are so many variables, unless otherwise specified, the instruction 
timing examples below assume optimal performance, that the instructions are 
available in the instruction queue in the same clock cycle that they are requested. The 
fetch stage ends when the instruction is dispatched. 

e The decode/dispatch stage consists of the time it takes to fully decode the instruction 
and dispatch it from the instruction queue to the appropriate execution unit. 
Instruction dispatch requires the following: 

— Instructions can be dispatched only from the two lowest instruction queue 
entries, [QO and IQ1. 

— A maximum of two instructions can be dispatched per clock cycle (although an 
additional branch instruction can be handled by the BPU). 


— Only one instruction can be dispatched to each execution unit per clock cycle. 
— There must be a vacancy in the specified execution unit. 
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— Arename register must be available for each destination operand specified by the 
instruction. 


— For an instruction to dispatch, the appropriate execution unit must be available 
and there must be an open position in the completion queue. If no entry is 
available, the instruction remains in the IQ. 


e The execute stage consists of the time between dispatch to the execution unit (or 
reservation station) and the point at which the instruction vacates the execution unit. 


Most integer instructions have a one-cycle latency; results of these instructions can 
be used in the clock cycle after an instruction enters the execution unit. However, 
integer multiply and divide instructions take multiple clock cycles to complete. The 
IU1 can process all integer instructions; the [U2 can process all integer instructions 
except multiply and divide instructions. 


The LSU and FPU are pipelined (as shown in Figure 6-2). 


¢ The complete (complete/write-back) pipeline stage maintains the correct 
architectural machine state and commits it to the architectural registers at the proper 
time. If the completion logic detects an instruction containing an exception status, 
all following instructions are cancelled, their execution results in rename registers 
are discarded, and the correct instruction stream is fetched. 


The complete stage ends when the instruction is retired. Two instructions can be 


retired per cycle. Instructions are retired only from the two lowest completion queue 
entries, CQO and CQ1. 


The notation conventions used in the instruction timing examples are as follows: 


Fetch—The fetch stage includes the time between when an instruction is 
requested and when it is brought into the instruction queue. This latency can be very 
variable, depending upon whether the instruction is in the BTIC, the on-chip cache, the L2 
cache, or system memory (in which case latency can be affected by bus speed and traffic on 
the system bus, and address translation issues). Therefore, in the examples in this chapters, 
the fetch stage is usually idealized, that is, an instruction is usually shown to be in the fetch 
stage when it is a valid instruction in the instruction queue. The instruction queue has six 
entries, IQO-IQS. 


In dispatch entry (IQ0/IQ1)—Instructions can be dispatched from IQO and 
IQ1. Because dispatch is instantaneous, it is perhaps more useful to describe it as an event 
that marks the point in time between the last cycle in the fetch stage and the first cycle in 
the execute stage. 


——_xecute—The operations specified by an instruction are being performed by 
the appropriate execution unit. The black stripe is a reminder that the instruction occupies 
an entry in the completion queue, described in Figure 6-3. 
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BE  Complete—The instruction is in the completion queue. In the final stage, the 
results of the executed instruction are written back and the instruction is retired. The 
completion queue has six entries, CQO—CQ5. 


In retirement entry—Completed instructions can be retired from CQO and 
CQ1. Like dispatch, retirement is an event that in this case occurs at the end of the final 
cycle of the complete stage. 


Figure 6-3 shows the stages of MPC750 execution units. 


1U1/IU2/SRU Instructions 


Fetch In Dispatch Execute!  Complete/Retire 
Entry 


SSS 


LSU Instructions 
Execute 


Fetch In Dispatch EA Cache Align Complete/Retire 
Entry Calculation 


[a 


FPU Instructions 
Execute 


Fetch In Dispatch Round/ — Complete/Retire 
Entry ny ne Normalize 


er ———— 


BPU Instructions 


Fetch Fetch In Dispatch — In Completion Complete/Retire2 
Predict Entry Queue? 


a 


1 Several integer instructions, such as multiply and divide instructions, require multiple cycles in 
the execute stage. 


2 Only those branch instructions that update the LR or CTR take an entry in the completion queue. 


Figure 6-3. MPC750 Microprocessor Pipeline Stages 


6.3. Timing Considerations 


The MPC750 is a superscalar processor; as many as three instructions can be issued to the 
execution units (one branch instruction to the branch processing unit, and two instructions 
issued from the dispatch queue to the other execution units) during each clock cycle. Only 
one instruction can be dispatched to each execution unit. 


Although instructions appear to the programmer to execute in program order, the MPC750 
improves performance by executing multiple instructions at a time, using hardware to 
manage dependencies. When an instruction is dispatched, the register file provides the 
source data to the execution unit. The register files and rename register have sufficient 
bandwidth to allow dispatch of two instructions per clock under most conditions. 
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The MPC750’s BPU decodes and executes branches immediately after they are fetched. 
When a conditional branch cannot be resolved due to a CR data dependency, the branch 
direction is predicted and execution continues from the predicted path. If the prediction is 
incorrect, the following steps are taken: 


1. The instruction queue is purged and fetching continues from the correct path. 


2. Any instructions ahead of the predicted branch in the completion queue are allowed 
to complete. 


3. Instructions after the mispredicted branch are purged. 
4. Dispatching resumes from the correct path. 


After an execution unit finishes executing an instruction, it places resulting data into the 
appropriate GPR or FPR rename register. The results are then stored into the correct GPR 
or FPR during the write-back stage. If a subsequent instruction needs the result as a source 
operand, it is made available simultaneously to the appropriate execution unit, which allows 
a data-dependent instruction to be decoded and dispatched without waiting to read the data 
from the register file. Branch instructions that update either the LR or CTR write back their 
results in a similar fashion. 


The following section describes this process in greater detail. 


6.3.1 General Instruction Flow 


As many as four instructions can be fetched into the instruction queue (IQ) in a single clock 
cycle. Instructions enter the IQ and are issued to the various execution units from the 
dispatch queue. The MPC750 tries to keep the IQ full at all times, unless instruction cache 
throttling is operating. 


The number of instructions requested in a clock cycle is determined by the number of 
vacant spaces in the IQ during the previous clock cycle. This is shown in the examples in 
this chapter. Although the instruction queue can accept as many as four new instructions in 
a single clock cycle, if only one IQ entry is vacant, only one instruction is fetched. Typically 
instructions are fetched from the on-chip instruction cache, but they may also be fetched 
from the branch target instruction cache (BTIC). If the instruction request hits in the BTIC, 
it can usually present the first two instructions of the new instruction stream in the next 
clock cycle, giving enough time for the next pair of instructions to be fetched from the 
instruction cache with no idle cycles. If instructions are not in the BTIC or the on-chip 
instruction cache, they are fetched from the L2 cache or from system memory. 


The MPC750’s instruction cache throttling feature, managed through the instruction cache 
throttling control (ICTC) register, can lower the processor’s overall junction temperature by 
slowing the instruction fetch rate. See Chapter 10, “Power and Thermal Management.” 


Branch instructions are identified by the fetcher, and forwarded to the BPU directly, 
bypassing the dispatch queue. If the branch is unconditional or if the specified conditions 
are already known, the branch can be resolved immediately. That is, the branch direction is 
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known and instruction fetching can continue from the correct location. Otherwise, the 
branch direction must be predicted. The MPC750 offers several resources to aid in quick 
resolution of branch instructions and for improving the accuracy of branch predictions. 
These include the following: 


e Branch target instruction cache—The 64-entry (four-way-associative) branch target 
instruction cache (BTIC) holds branch target instructions so when a branch is 
encountered in a repeated loop, usually the first two instructions in the target stream 
can be fetched into the instruction queue on the next clock cycle. The BTIC can be 
disabled and invalidated through bits in HIDO. 


¢ Dynamic branch prediction—The 512-entry branch history table (BHT) is 
implemented with two bits per entry for four degrees of prediction—not-taken, 
strongly not-taken, taken, strongly taken. Whether a branch instruction is taken or 
not-taken can change the strength of the next prediction. This dynamic branch 
prediction is not defined by the PowerPC architecture. 


To reduce aliasing, only predicted branches update the BHT entries. Dynamic 
branch prediction is enabled by setting HIDO[BHT]; otherwise, static branch 
prediction is used. 


e Static branch prediction—Static branch prediction is defined by the PowerPC 
architecture and involves encoding the branch instructions. See Section 6.4.1.3.1, 
“Static Branch Prediction.” 


Branch instructions that do not update the LR or CTR are removed from the instruction 
stream either by branch folding or removal of fall-through branch instructions, as described 
in Section 6.4.1.1, “Branch Folding and Removal of Fall-Through Branch Instructions.” 
Branch instructions that update the LR or CTR are treated as if they require dispatch (even 
through they are not issued to an execution unit in the process). They are assigned a position 
in the completion queue to ensure that the CTR and LR are updated sequentially. 


All other instructions are issued from the IQO and IQ1. The dispatch rate depends upon the 
availability of resources such as the execution units, rename registers, and completion 
queue entries, and upon the serializing behavior of some instructions. Instructions are 
dispatched in program order; an instruction in IQ] cannot be dispatched ahead of one in 


1Q0. 


Figure 6-4 shows the paths taken by instructions. 
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Figure 6-4. Instruction Flow Diagram 


6.3.2 Instruction Fetch Timing 


Instruction fetch latency depends on whether the fetch hits the BTIC, the on-chip 
instruction cache, or the L2 cache, if one is implemented. If no cache hit occurs, a memory 
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transaction is required in which case fetch latency is affected by bus traffic, bus clock speed, 
and memory translation. These issues are discussed further in the following sections. 


6.3.2.1. Cache Arbitration 


When the instruction fetcher requests instructions from the instruction cache, two things 
may happen. If the instruction cache is idle and the requested instructions are present, they 
are provided on the next clock cycle. However, if the instruction cache is busy due to a 
cache-line-reload operation, instructions cannot be fetched until that operation completes. 


6.3.2.2 Cache Hit 


If the instruction fetch hits the instruction cache, it takes only one clock cycle after the 
request for as many as four instructions to enter the instruction queue. Note that the cache 
is not blocked to internal accesses during a cache reload completes (hits under misses). The 
critical double word is written simultaneously to the cache and forwarded to the requesting 
unit, minimizing stalls due to load delays. 


Figure 6-5 shows a simple example of instruction fetching that hits in the on-chip cache. 
This example uses a series of integer add and double-precision floating-point add 
instructions to show how the number of instructions to be fetched is determined, how 
program order is maintained by the instruction and completion queues, how instructions are 
dispatched and retired in pairs (maximum), and how the FPU, IU1, and [U2 pipelines 
function. The following instruction sequence is examined: 


2 add 
3 fadd 
4 add 
is) fadd 
6 br 6 
7 fsub 
8 fadd 
9 fadd 
10 add 
il add 
12 add 
13 add 
14 fadd 
15 add 


16 ©6fadd 
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Figure 6-5. Instruction Timing—Cache Hit 


The instruction timing for this example is described cycle-by-cycle as follows: 


0. Incycle 0, instructions 0-3 are fetched from the instruction cache. Instructions 0 and 
1 are placed in the two entries in the instruction queue from which they can be 
dispatched on the next clock cycle. 
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. Incycle 1, instructions 0 and 1 are dispatched to the [U2 and FPU, respectively. 
Notice that for instructions to be dispatched they must be assigned positions in the 
completion queue. In this case, since the completion queue was empty, instructions 
0 and | take the two lowest entries in the completion queue. Instructions 2 and 3 
drop into the two dispatch positions in the instruction queue. Because there were 
two positions available in the instruction queue in clock cycle 0, two instructions (4 
and 5) are fetched into the instruction queue. Instruction 4 is a branch unconditional 
instruction, which resolves immediately as taken. Because the branch is taken, it 
can therefore be folded from the instruction queue. 


. Incycle 2, assume a BTIC hit occurs and target instructions 6 and 7 are fetched into 
the instruction queue, replacing the folded b instruction (4) and instruction 5. 
Instruction 0 completes, writes back its results and vacates the completion queue by 
the end of the clock cycle. Instruction | enters the second FPU execute stage, 
instruction 2 is dispatched to the [U2, and instruction 3 is dispatched into the first 
FPU execute stage. Because the taken branch instruction (4) does not update either 
CTR or LR, it does not require a position in the completion queue and can be 
folded. 


. Incycle 3, target instructions (6 and 7) are fetched, replacing instructions 4 and 5 in 
IQ0 and IQ1. This replacement on taken branches is called branch folding. 
Instruction 1 proceeds through the last of the three FPU execute stages. Instruction 
2 has executed but must remain in the completion queue until instruction 1 
completes. Instruction 3 replaces instruction | in the second stage of the FPU, and 
instruction 6 replaces instruction 3 in the first stage. Also, as will be shown in cycle 
4, there is a single-cycle stall that occurs when the FPU pipeline is full. 


Because there were three vacancies in the instruction queue in the previous clock 
cycle, instructions 8—11 are fetched in this clock cycle. 


. Instruction 1 completes in cycle 4, allowing instruction 2 to complete. Instructions 
3 and 6 continue through the FPU pipeline. Although instruction 7 is in IQ], it 
cannot be dispatched because the FPU is busy, and because instruction 7 cannot be 
dispatched neither can instruction 8. The additional cycle stall allows the 
instruction queue to be completely filled. Because there was one opening in the 
instruction queue in clock cycle 3, one instruction is fetched (12) and the 
instruction queue is full. 


. Incycle 5, instruction 3 completes, allowing instruction 7 to be dispatched to the 
FPU, which in turn allows instruction 8 to be dispatched to the IU2. Instructions 9 
and 10 drop to the dispatch positions in the instruction queue. No instructions are 
fetched in this clock cycle because there were no vacant IQ entries in clock cycle 4. 
. Incycle 6, instruction 6 completes, instruction 7 is in stage 2 of the FPU execute 
stage, and although instruction 8 has executed, it must wait for instruction 7 to 
complete. The two integer instructions, 9 and 10, are dispatched to the [U2 and 
IU1, respectively. Fetching resumes with instructions 13 and 14. 
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7. Incycle 7, instruction 7 is in the final FPU execute stage and instructions 8—10 wait 
in the completion queue. Instructions 11 and 12 are dispatched to the [U2 and FPU, 
respectively. Note that at this point the completion queue is full. Two more 
instructions (15 and 16, which are shown only in the instruction queue) are fetched. 


8. Incycle 8, instructions 7-11 are through executing. Instructions 7 and 8 complete, 
write back, and vacate the completion queue. Because the completion queue is full, 
instructions 13 and 14 cannot be dispatched and must remain in the instruction 
queue. Only the FPU is executing during this cycle (instruction 12). Additional 
instructions (instructions 16 and 17, shown only in the instruction queue) are 
fetched, filling the instruction queue. 


9. Incycle 9, two more instructions (instructions 7 and 8) are retired from the 
completion queue allowing instructions 13 and 14 to be dispatched, again filling the 
completion queue. No instructions are fetched on this cycle because the instruction 
queue was full on the previous clock cycle. 


6.3.2.3. Cache Miss 


Figure 6-6 shows an instruction fetch that misses both the on-chip cache and L2 cache. A 
processor/bus clock ratio is 2:1 is used. The same instruction sequence is used as in 
Section 6.3.2.2, “Cache Hit’; however in this example, the branch target instruction is not 
in either the L1 or L2 cache. Because the target instruction is not in the L1 cache, it cannot 
be in the BTIC. 


A cache miss, extends the latency of the fetch stage, so in this example, the fetch stage 
shown represents not only the time the instruction spends in the IQ, but the time required 
for the instruction to be loaded from system memory, beginning in clock cycle 2. 


During clock cycle 3, the target instruction for the b instruction is not in the BTIC, the 
instruction cache or the L2 cache; therefore, a memory access must occur. During clock 
cycle 5, the address of the block of instructions is sent to the system bus. During clock cycle 
7, two instructions (64 bits) are returned from memory on the first beat and are forwarded 
both to the cache and the instruction fetcher. 
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* Instructions 5 and 6 are not in the IQ in clock cycle 5. Here, the fetch stage shows cache latency. 


Figure 6-6. Instruction Timing—Cache Miss 


6.3.2.4 L2 Cache Access Timing Considerations 


If an instruction fetch misses both the BTIC and the on-chip instruction cache, the MPC750 
next looks in the L2 cache. (Note that the MPC740 does not implement the L2 cache 
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interface.) If the requested instructions are there, they are burst into the MPC750 in much 
the same way as shown in Figure 6-6. The formula for the L2 cache latency for instruction 
accesses 1s as follows: 


1 processor clock + 3 L2 clocks + 1 processor clock 


Therefore, if the L2 is operating in 2:1 mode, the instruction fetch takes 8 processor clock 
cycles. Additional factors can also affect this latency, including the type of memory used to 
implement the L2 and whether the processor clock and L2 clocks are aligned immediately. 


For more information about the L2 cache implementation, see Chapter 9, “L2 Cache 
Interface Operation.” 


6.3.3 Instruction Dispatch and Completion Considerations 


Several factors affect the MPC750’s ability to dispatch instructions at a peak rate of two per 
cycle—the availability of the execution unit, destination rename registers, and completion 
queue, as well as the handling of completion-serialized instructions. Several of these 
limiting factors are illustrated in the previous instruction timing examples. 


To reduce dispatch unit stalls due to instruction data dependencies, the MPC750 provides 
a single-entry reservation station for the FPU, SRU, and each IU, and a two-entry 
reservation station for the LSU. If a data dependency keeps an instruction from starting 
execution, that instruction is dispatched to the reservation station associated with its 
execution unit (and the rename registers are assigned), thereby freeing the positions in the 
instruction queue so instructions can be dispatched to other execution units. Execution 
begins during the same clock cycle that the rename buffer is updated with the data the 
instruction is dependent on. 


If both instructions in IQO and IQ1 require the same execution unit, the instruction in IQ1 
cannot be dispatched until the first instruction proceeds through the pipeline and provides 
the subsequent instruction with a vacancy in the requested execution unit. 


The completion unit maintains program order after instructions are dispatched from the 
instruction queue, guaranteeing in-order completion and a precise exception model. 
Completing an instruction implies committing execution results to the architected 
destination registers. In-order completion ensures the correct architectural state when the 
MPC750 must recover from a mispredicted branch or an exception. 


Instruction state and all information required for completion is kept in the six-entry, 
first-in/first-out completion queue. An completion queue entry is allocated for each 
instruction when it is dispatched to an execute unit; if no entry is available, the dispatch unit 
stalls. A maximum of two instructions per cycle may be completed and retired from the 
completion queue, and the flow of instructions can stall when a longer-latency instruction 
reaches the last position in the completion queue. Subsequent instructions cannot be 
completed and retired until that longer-latency instruction completes and retires. Examples 
of this are shown in Section 6.3.2.2, “Cache Hit,” and Section 6.3.2.3, “Cache Miss.” 
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The MPC750 can execute instructions out-of-order, but in-order completion by the 
completion unit ensures a precise exception mechanism. Program-related exceptions are 
signaled when the instruction causing the exception reaches the last position in the 
completion queue. Prior instructions are allowed to complete before the exception is taken. 


6.3.3.1 Rename Register Operation 


To avoid contention for a given register file location in the course of out-of-order execution, 
the MPC750 provides rename registers for holding instruction results before the 
completion commits them to the architected register. There are six GPR rename registers, 
six FPR rename registers, and one each for the CR, LR, and CTR. 


When the dispatch unit dispatches an instruction to its execution unit, it allocates a rename 
register (or registers) for the results of that instruction. If an instruction is dispatched to a 
reservation station associated with an execution unit due to a data dependency, the 
dispatcher also provides a tag to the execution unit identifying the rename register that 
forwards the required data at completion. When the source data reaches the rename register, 
execution can begin. 


Instruction results are transferred from the rename registers to the architected registers by 
the completion unit when an instruction is retired from the completion queue without 
exceptions and after any predicted branch conditions preceding it in the completion queue 
have been resolved correctly. If a branch prediction was incorrect, the instructions 
following the branch are flushed from the completion queue, and any results of those 
instructions are flushed from the rename registers. 


6.3.3.2 Instruction Serialization 


Although the MPC750 can dispatch and complete two instructions per cycle, so-called 
serializing instructions limit dispatch and completion to one instruction per cycle. There are 
three types of instruction serialization: 


e Execution serialization—Execution-serialized instructions are dispatched, held in 
the functional unit and do not execute until all prior instructions have completed. A 
functional unit holding an execution-serialized instruction will not accept further 
instructions from the dispatcher. For example, execution serialization is used for 
instructions that modify nonrenamed resources. Results from these instructions are 
generally not available or forwarded to subsequent instructions until the instruction 
completes (using mtspr to write to LR or CTR does provide forwarding to branch 
instructions). 

¢ Completion serialization (also referred to as post-dispatch or tail 
serialization)—Completion-serialized instructions inhibit dispatching of subsequent 
instructions until the serialized instruction completes. Completion serialization is 
used for instructions that bypass the normal rename mechanism. 
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¢ Refetch serialization (flush serialization)—Refetch-serialized instructions inhibit 
dispatch of subsequent instructions and force refetching of subsequent instructions 
after completion. 


6.4 Execution Unit Timings 


The following sections describe instruction timing considerations within each of the 
respective execution units in the MPC750. 


6.4.1. Branch Processing Unit Execution Timing 


Flow control operations (conditional branches, unconditional branches, and traps) are 
typically expensive to execute in most machines because they disrupt normal flow in the 
instruction stream. When a change in program flow occurs, the IQ must be reloaded with 
the target instruction stream. Previously issued instructions will continue to execute while 
the new instruction stream makes its way into the IQ, but depending on whether the target 
instruction is in the BTIC, instruction cache, L2 cache, or in system memory, some 
opportunities may be missed to execute instructions, as the example in Section 6.3.2.3, 
“Cache Miss,” shows. 


Performance features such as the branch folding, removal of fall-through branch 
instructions, BTIC, dynamic branch prediction (implemented in the BHT), two-level 
branch prediction, and the implementation of nonblocking caches minimize the penalties 
associated with flow control operations on the MPC750. The timing for branch instruction 
execution is determined by many factors including the following: 

¢ Whether the branch is taken 


e¢ Whether instructions in the target stream, typically the first two instructions in the 
target stream, are in the branch target instruction cache (BTIC) 


e Whether the target instruction stream is in the on-chip cache 
¢ Whether the branch is predicted 
e Whether the prediction is correct 


6.4.1.1. Branch Folding and Removal of Fall-Through Branch 
Instructions 


When a branch instruction is encountered by the fetcher, the BPU immediately begins to 
decode it and tries to resolve it. All branch instructions except those that update either the 
LR or CTR are removed from the instruction flow before they would take a position in the 
completion queue. 


Branch folding occurs either when a branch is taken or is predicted as taken (as is the case 
with unconditional branches). When the BPU folds the branch instruction out of the 
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instruction stream, the target instruction stream that is fetched into the instruction queue 
overwrites the branch instruction. 


Figure 6-7 shows branch folding. Here a br instruction is encountered in a series of add 
instructions. The branch is resolved as taken. What happens on the next clock cycle depends 
on whether the target instruction stream is in the BTIC, the instruction cache, or if it must 
be fetched from the L2 cache or from system memory. 


Figure 6-7 shows cases where there is a BTIC hit, and when there is a BTIC miss (and 
instruction cache hit). 


If there is a BTIC hit on the next clock cycle the b instruction is replaced by the target 
instruction, and1, that was found in the BTIC; the second and instruction is also fetched 
from the BTIC. On the next clock cycle, the next four and instructions from the target 
stream are fetched from the instruction cache. 


If the target instruction is not in the BTIC, there is an idle cycle while the fetcher attempts 
to fetch the first four instructions from the instruction cache (on the next clock cycle). In 
the example in Figure 6-7, the first four target instruction are fetched on the next clock. 


If it misses in the caches, an L2 cache or memory access is required, the latency of which 
is dependent on several factors, such as processor/bus clock ratios. In most cases, new 
instructions arrive in the IQ before the execution units become idle. 






































Branch Folding Branch Folding 
(Taken Branch/BTIC Hit) (Taken Branch/BTIC 
ClockO0 = Clock 1 Clock 2 ClockO ~— Clock 1 Clock 2 

1Q5 [add5 1Q5 [add5 
1Q4 [add4 1Q4 
1Q3 | add3 and6 1Q3 and4 
1Q2 b and5 1Q2 and3 
1Q1 | add2 and2 and4 1Q1 and2 
1Q0 | add1 and1 and3 1Q0 and1 









































Figure 6-7. Branch Folding 


Figure 6-8 shows the removal of fall-through branch instructions, which occurs when a 
branch is not taken or is predicted as not taken. 


Branch Fall-Through 
(Not-Taken Branch) 


Clock 0 Clock 1 Clock 2 

















1Q5 | add5 
1Q4 | add4 
1Q3 |add3 add | add7/ 
1Q2 b add4 add6 
1Q1 | add2 add3 add5 
1Q0 | add1 b add4 


























Figure 6-8. Removal of Fall-Through Branch Instruction 
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In this case the branch instruction remains in the instruction queue and is removed from the 
instruction stream as if it were dispatched. However, it is not dispatched to an execution unit 
and is not assigned an entry in the completion queue. 


When a branch instruction is detected before it reaches a dispatch position, and if the branch 
is correctly predicted as taken, folding the branch instruction (and any instructions from the 
incorrect path) reduces the latency required for flow control to zero; instruction execution 
proceeds as though the branch was never there. 


The advantage of removing the fall-through branch instructions at dispatch is only 
marginally less than that of branch folding. Because the branch is not taken, only the branch 
instruction needs to be discarded. The only cost of expelling the branch instruction from 
one of the dispatch entries rather than folding it is missing a chance to dispatch an 
executable instruction from that position. 


6.4.1.2 Branch Instructions and Completion 


As described in the previous section, instructions that do not update either the LR or CTR 
are removed from the instruction stream before they reach the completion queue, either by 
branch folding (in the case of taken branches) or by removing fall-through branch 
instructions at dispatch (in the case of non-taken branches). However, branch instructions 
that update the architected LR and CTR must do so in program order and therefore must 
perform write-back in the completion stage, like the instructions that update the FPRs and 
GPRs. 


Branch instructions that update the CTR or LR pass through the instruction queue like 
nonbranch instructions. At the point of dispatch, however, they are not sent to an execution 
unit, but rather are assigned a slot in the completion queue, as shown in Figure 6-9. 


Branch Completion 
(LR/CTR Write-Back) 



































Clock 0 Clock 1 Clock 2 Clock 3 
1Q5 [add5 
1Q4 | add4 
1Q3 [add3] Fadd5 | ladd7 | Faddg | 
1Q2 | be add4 add6 add8 
1Q1 | add2 add3 add5 add7 
1Q0 | add1 bc add4 add6 
CQ5 
ca4; | | | 
CQ3 
CQ2 
CQ1 add2 add3 add5 
CQO add1 be add4 





























Figure 6-9. Branch Completion 
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In this example, the be instruction is encoded to decrement the CTR. It is predicted as 
not-taken in clock cycle 0. In clock cycle 2, be and add3 are both dispatched. In clock cycle 
3, the architected CTR is updated and the be instruction is retired from the completion 
queue. 


6.4.1.3. Branch Prediction and Resolution 


The MPC750 supports the following two types of branch prediction: 


¢ Static branch prediction—This is defined by the PowerPC architecture as part of the 
encoding of branch instructions. 


¢ Dynamic branch prediction—This is a processor-specific mechanism implemented 
in hardware (in particular the branch history table, or BHT) that monitors branch 
instruction behavior and maintains a record from which the next occurrence of the 
branch instruction is predicted. 


When a conditional branch cannot be resolved due to a CR data dependency, the BPU 
predicts whether it will be taken, and instruction fetching proceeds down the predicted path. 
If the branch prediction resolves as incorrect, the instruction queue and all subsequently 
executed instructions are purged, instructions executed prior to the predicted branch are 
allowed to complete, and instruction fetching resumes down the correct path. 


The MPC750 executes through two levels of prediction. Instructions from the first 
unresolved branch can execute, but they cannot complete until the branch is resolved. If a 
second branch instruction is encountered in the predicted instruction stream, it can be 
predicted and instructions can be fetched, but not executed, from the second branch. No 
action can be taken for a third branch instruction until at least one of the two previous 
branch instructions is resolved. 


The number of instructions that can be executed after the issue of a predicted branch 
instruction is limited by the fact that no instruction executed after a predicted branch may 
actually update the register files or memory until the branch is completed. That is, 
instructions may be issued and executed, but cannot reach the write-back stage in the 
completion unit. When an instruction following a predicted branch completes execution, it 
does not write back its results to the architected registers, instead, it stalls in the completion 
queue. Of course, when the completion queue is full, no additional instructions can be 
dispatched, even if an execution unit is idle. 


In the case of a misprediction, the MPC750 can easily redirect its machine state because the 
programming model has not been updated. When a branch is mispredicted, all instructions 
that were dispatched after the predicted branch instruction are flushed from the completion 
queue and any results are flushed from the rename registers. 


The BTIC is a cache of recently used branch target instructions. If the search for the branch 
target hits in the cache, the first one or two branch instructions 1s available in the instruction 
queue on the next cycle (shown in Figure 6-5). Two instructions are fetched on a BTIC hit, 
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unless the branch target is the last instruction in a cache block, in which case one instruction 
is fetched. 


In some situations, an instruction sequence creates dependencies that keep a branch 
instruction from being resolved immediately, thereby delaying execution of the subsequent 
instruction stream based on the predicted outcome of the branch instruction. The instruction 
sequences and the resulting action of the branch instruction are described as follows: 


e Anmtspr(LK) followed by a belr—Fetching stops and the branch waits for the 
mtspr to execute. 


e Anmtspr(CTR) followed by a bectr—Fetching stops and the branch waits for the 
mtspr to execute. 


¢ Anmtspr(CTR) followed by a bec (CTR decrement)—Fetching stops and the 
branch waits for the mtspr to execute. 


e A third be(based-on-CR) is encountered while there are two unresolved 
be(based-on-CR). The third be(based-on-CR) is not executed and fetching stops 
until one of the previous be(based-on-CR) is resolved. (Note that branch conditions 
can be a function of the CTR and the CR; if the CTR condition is sufficient to resolve 
the branch, then a CR-dependency is ignored.) 


6.4.1.3.1 Static Branch Prediction 


The PowerPC architecture provides a field in branch instructions (the BO field) to allow 
software to hint whether a branch is likely to be taken. Rather than delaying instruction 
processing until the condition is known, the MPC750 uses the instruction encoding to 
predict whether the branch is likely to be taken and begins fetching and executing along that 
path. When the branch condition is known, the prediction is evaluated. If the prediction was 
correct, program flow continues along that path; otherwise, the processor flushes any 
instructions and their results from the mispredicted path, and program flow resumes along 
the correct path. 


Static branch prediction is used when HIDO[BHT] is cleared. That is, the branch history 
table, which is used for dynamic branch prediction, is disabled. For information about static 
branch prediction, see “Conditional Branch Control,” in Chapter 4, “Addressing Modes and 
Instruction Set Summary,” in the Programming Environments Manual. 


6.4.1.3.2 Predicted Branch Timing Examples 


Figure 6-10 shows cases where branch instructions are predicted. It shows how both taken 
and not-taken branches are handled and how the MPC750 handles both correct and 
incorrect predictions. The example shows the timing for the following instruction sequence: 


add 
add 
be 
mulhw 
be TO 


mBwWNHF O 
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i) fadd 
6 and 
add 

T7 add 
T8 add 
T9 add 
T10 add 
Pia, cox 

































































































































































































































































| | 
—0 ad Fetch 
ad In dispatch entry (1Q0/I1Q1) 
2 be ee Predict 
3 mulhw a ee __ Exesuie 
4 bc 
Complete (In CQ) 
5 fadd 
| TO add —s In retirement entry (CQO/CQ1) | | 
| | _T1 add bs | | | | | | 
| | | | I I I | 
T2 add 
T3 add 
T4 and 
T5 or 
| | | | | | 
5 fadd * ae 
LL ———= 
| | eee | 
| | | | | | 
Instruction 
Queue | | 
| | | | 
3 5 T5 T5 (8) | : | 
']2 (bc)|'] 4 T4 T4 (7) | | | 
1 3 T1 T3 T3 6 
0 2 TO T2 T2 5 
Completion 
ueue 
3 11 (8) (8) (8) 
2 TO T1 (7) (7) (7) 
1 1 3 TO 6 6 6 6 
0 0 2 3 5 5 5 5 






























































* Instructions 5 and 6 are not in the IQ in clock cycle 5. Here, the fetch stage shows cache latency. 


Figure 6-10. Branch Instruction Timing 
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0. During clock cycle 0, instructions 0 and 1| are dispatched to their respective 
execution units. Instruction 2 is a branch instruction that updates the CTR. It is 
predicted as not taken in clock cycle 0. Instruction 3 is amulhw instruction on which 
instruction 4 depends. 


1. Inclock cycle 1, instructions 2 and 3 enter the dispatch entries in the IQ. Instruction 
4 (a second be instruction) and 5 are fetched. The second be instruction is predicted 
as taken. It can be folded, but it cannot be resolved until instruction 3 writes back. 


2. In clock cycle 2, instruction 4 has been folded and instruction 5 has been flushed 
from the IQ. The two target instructions, TO and T1, are both in the BTIC, so they 
are fetched in this cycle. Note that even though the first be instruction may not have 
resolved by this point (we can assume it has), the MPC750 allows fetching from a 
second predicted branch stream. However, these instructions could not be 
dispatched until the previous branch has resolved. 


3. In clock cycle 3, target instructions T2—TS5 are fetched as TO and T1 are dispatched. 


4. Inclock cycle 4, instruction 3, on which the second branch instruction depended, 
writes back and the branch prediction is proven incorrect. Even though TO is in 
CQ1, from which it could be written back, it is not written back because the branch 
prediction was incorrect. All target instructions are flushed from their positions in 
the pipeline at the end of this clock cycle, as are any results in the rename registers. 


After one clock cycle required to refetch the original instruction stream, instruction 5, the 
same instruction that was fetched in clock cycle 1, is brought back into the IQ from the 
instruction cache, along with three others (not all of which are shown). 


6.4.2 Integer Unit Execution Timing 


The MPC750 has two integer units. The [U1 can execute all integer instructions; and the 
TU2 can execute all integer instructions except multiply and divide instructions. As shown 
in Figure 6-2, each integer unit has one execute pipeline stage, thus when a multicycle 
integer instruction is being executed, no other integer instructions can begin to execute. 
Table 6-6 lists integer instruction latencies. 


Most integer instructions have an execution latency of one clock cycle. 


6.4.3 Floating-Point Unit Execution Timing 


The floating-point unit on the MPC750 executes all floating-point instructions. Execution 
of most floating-point instructions is pipelined within the FPU, allowing up to three 
instructions to be executing in the FPU concurrently. While most floating-point instructions 
execute with three- or four-cycle latency, and one- or two-cycle throughput, three 
instructions (fdivs, fdiv, and fres) execute with latencies of 11 to 33 cycles. The fdivs, fdiv, 
fres, mtfsb0, mtfsb1, mtfsfi, mffs, and mtfsf instructions block the floating-point unit 
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pipeline until they complete execution, and thereby inhibit the dispatch of additional 
floating-point instructions. See Table 6-7 for floating-point instruction execution timing. 


6.4.4 Effect of Floating-Point Exceptions on Performance 


For the fastest and most predictable floating-point performance, all exceptions should be 
disabled in the FPSCR and MSR. 


6.4.5 Load/Store Unit Execution Timing 


The execution of most load and store instructions is pipelined. The LSU has two pipeline 
stages. The first is for effective address calculation and MMU translation and the second is 
for accessing data in the cache. Load and store instructions have a two-cycle latency and 
one-cycle throughput. 


If operands are misaligned, additional latency may be required either for an alignment 
exception to be taken or for additional bus accesses. Load instructions that miss in the cache 
block subsequent cache accesses during the cache line refill. Table 6-8 gives load and store 
instruction execution latencies. 


6.4.6 Effect of Operand Placement on Performance 


The PowerPC VEA states that the placement (location and alignment) of operands in 
memory may affect the relative performance of memory accesses, and in some cases affect 
it significantly. The effects memory operand placement has on performance are shown in 
Table 6-1. 


The best performance is guaranteed if memory operands are aligned on natural boundaries. 
For the best performance across the widest range of implementations, the programmer 
should assume the performance model described in Chapter 3, “Operand Conventions,” in 
the Programming Environments Manual. 


The effect of misalignment on memory access latency is the same for big- and little-endian 
addressing modes except for multiple and string operations that cause an alignment 
exception in little-endian mode. 


Table 6-1. Performance Effects of Memory Operand Placement 




















Operand Boundary Crossing 
Size Byte Alignment None 8 Byte Cache Block Protection Boundary 
Integer 
4 byte 4 Optimal 1 = _ _ 
<4 Optimal Good Good Good 
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Table 6-1. Performance Effects of Memory Operand Placement (continued) 












































Operand Boundary Crossing 
Size Byte Alignment None 8 Byte Cache Block Protection Boundary 
2 byte 2 Optimal — — — 
<2 Optimal Good Good Good 
1 byte 1 Optimal —_ — —_ 
Imw, 4 Good % Good Good Good 
stmw 2 
<4 Poor 4 Poor Poor Poor 
String 2 = Good Good Good Good 
Floating-Point 
8 byte 8 Optimal —_ — —_ 
4 — Good Good Good 
<4 — Poor Poor Poor 
4 byte 4 Optimal — — — 
<4 Poor Poor Poor Poor 
Notes: 


1 Optimal means one EA calculation occurs. 

2 Not supported in little-endian mode, causes an alignment exception. 

3 Good means multiple EA calculations occur that may cause additional bus activities with multiple bus transfers. 
4 Poor means that an alignment exception occurs. 


6.4.7 Integer Store Gathering 


The MPC750 performs store gathering for write-through operations to nonguarded space. 
It performs cache-inhibited stores to nonguarded space for 4-byte, word-aligned stores. 
These stores are combined in the LSU to form a double word and are sent out on the 60x 
bus as a single-beat operation. However, stores are gathered only if the successive stores 
meet the criteria and are queued and pending. Store gathering occurs regardless of the 
address order of the stores. Store gathering is enabled by setting HIDO[SGE]. Stores can be 
gathered in both endian modes. 


Store gathering is not done for the following: 
¢ Cacheable store operations 
e Stores to guarded cache-inhibited or write-through space 
¢ Byte-reverse store operations 
° stwex. instructions 
* ecowx instructions 
e A store that occurs during a table search operation 
¢ Floating-point store operations 
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If store gathering is enabled and the stores do not fall under the above categories, an eieio 
or sync instruction must be used to prevent two stores from being gathered. 


6.4.8 System Register Unit Execution Timing 


Most instructions executed by the SRU either directly access renamed registers or access or 
modify nonrenamed registers. They generally execute in a serial manner. Results from these 
instructions are not available to subsequent instructions until the instruction completes and 
is retired. See Section 6.3.3.2, “Instruction Serialization,’ for more information on 
serializing instructions executed by the SRU, and refer to Table 6-4 and Table 6-5 for SRU 
instruction execution timings. 


6.5 Memory Performance Considerations 


Because the MPC750 can have a maximum instruction throughput of three instructions per 
clock cycle, lack of memory bandwidth can affect performance. For the MPC750 to 
maximize performance, it must be able to read and write data efficiently. If a system has 
multiple bus devices, one of them may experience long memory latencies while another bus 
master (for example, a DMA controller) is using the external bus. 


6.5.1 Caching and Memory Coherency 


To minimize the effect of bus contention, the PowerPC architecture defines WIM bits that 
are used to configure memory regions as caching-enforced or caching-inhibited. Accesses 
to such memory locations never update the on-chip cache. If a cache-inhibited access hits 
the on-chip cache, the cache block is invalidated. If the cache block is marked modified, it 
is copied back to memory before being invalidated. Where caching is permitted, memory 
is configured as either write-back or write-through, which are described as follows: 


e¢ Write-back— Configuring a memory region as write-back lets a processor modify 
data in the cache without updating system memory. For such locations, memory 
updates occur only on modified cache block replacements, cache flushes, or when 
one processor needs data that is modified in another’s cache. Therefore, configuring 
memory as write-back can help when bus traffic could cause bottlenecks, especially 
for multiprocessor systems and for regions in which data, such as local variables, is 
used often and is coupled closely to a processor. 


If multiple devices use data in a memory region marked write-through, snooping 
must be enabled to allow the copy-back and cache invalidation operations necessary 
to ensure cache coherency. The MPC750’s snooping hardware keeps other devices 
from accessing invalid data. For example, when snooping is enabled, the MPC750 
monitors transactions of other bus devices. For example, if another device needs data 
that is modified on the MPC750’s cache, the access is delayed so the MPC750 can 
copy the modified data to memory. 
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e Write-through—Store operations to memory marked write-through always update 
both system memory and the on-chip cache on cache hits. Because valid cache 
contents always match system memory marked write-through, cache hits from other 
devices do not cause modified data to be copied back as they do for locations marked 
write-back. However, all write operations are passed to the bus, which can limit 
performance. Load operations that miss the on-chip cache must wait for the external 
store operation. 


Write-through configuration is useful when cached data must agree with external 


memory (for example, video memory), when shared (global) data may be needed 
often, or when it is undesirable to allocate a cache block on a cache miss. 


Chapter 3, “L1 Instruction and Data Cache Operation,” describes the caches, memory 
configuration, and snooping in detail. 


6.5.2 Effect of TLB Miss 


If a page address translation is not in a TLB, the MPC750 hardware searches the page tables 
and updates the TLB when a translation is found. Table 6-2 shows the estimated latency for 
the hardware TLB load for different cache configurations and conditions. 


Table 6-2. TLB Miss Latencies 


























L1 Condition L2 Condition Processor/L2 Processor/System Bus | Estimated Latency 
(Instruction and Data) Clock Ratio Clock Ratio (Cycles) 

100% cache hit — — — 7 

100% cache miss 100% cache hit 1:1 — 13 
100% cache miss 100% cache hit 1.5:1 — 18 
100% cache miss 100% cache hit 2:1 — 20 
100% cache miss 100% cache miss 1:1 2.5:1 (6:3:3:3 memory) 62 
100% cache miss 100% cache miss 1:1 4:1 (5:2:2:2 memory) 77 

















The PTE table search assumes a hit in the first entry of the primary PTEG. 


6.6 Instruction Scheduling Guidelines 


The performance of the MPC750 can be improved by avoiding resource conflicts and 
scheduling instructions to take fullest advantage of the parallel execution units. Instruction 
scheduling on the MPC750 can be improved by observing the following guidelines: 


¢ To reduce mispredictions, separate the instruction that sets CR bits from the branch 
instruction that evaluates them. Because there can be no more than 12 instructions 
in the processor (with the instruction that sets CR in CQO and the dependent branch 
instruction in IQ5), there is no advantage to having more than 10 instructions 
between them. 
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Likewise, when branching to a location specified by the CTR or LR, separate the 
mtspr instruction that initializes the CTR or LR from the dependent branch 
instruction. This ensures the register values are immediately available to the branch 
instruction. 


Schedule instructions such that two can be dispatched at a time. 
Schedule instructions to minimize stalls due to execution units being busy. 


Avoid scheduling high-latency instructions close together. Interspersing 
single-cycle latency instructions between longer-latency instructions minimizes the 
effect that instructions such as integer divide and multiply can have on throughput. 


Avoid using serializing instructions. 
Schedule instructions to avoid dispatch stalls: 


— Six instructions can be tracked in the completion queue; therefore, only six 
instructions can be in the execute stages at any one time 


— There are six GPR rename registers; therefore only six GPRs can be specified as 
destination operands at any time. If no rename registers are available, 
instructions cannot enter the execute stage and remain in the reservation station 
or instruction queue until they become available. 


Note that load with update address instructions use two destination registers 


— Similarly, there are six FPR rename registers, so only six FPR destination 
operands can be in the execute and complete stages at any time. 


Branch, Dispatch, and Completion Unit Resource 
Requirements 


This section describes the specific resources required to avoid stalls during branch 
resolution, instruction dispatching, and instruction completion. 


6.6.1.1 Branch Resolution Resource Requirements 


The following is a list of branch instructions and the resources required to avoid stalling the 
fetch unit in the course of branch resolution: 


The belr instruction requires LR availability. 
The bectr instruction requires CTR availability. 
Branch and link instructions require shadow LR availability. 


The “branch conditional on counter decrement and the CR” condition requires CTR 
availability or the CR condition must be false, and the MPC750 cannot execute 
instructions after an unresolved predicted branch when the BPU encounters a 
branch. 


A branch conditional on CR condition cannot be executed following an unresolved 
predicted branch instruction. 
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6.6.1.2 Dispatch Unit Resource Requirements 
The following is a list of resources required to avoid stalls in the dispatch unit. IQ[O] and 
IQ[1] are the two dispatch entries in the instruction queue: 
¢ Requirements for dispatching from IQ[O0] are as follows: 
— Needed execution unit available 
— Needed GPR rename registers available 
— Needed FPR rename registers available 
— Completion queue is not full. 
— Acompletion-serialized instruction is not being executed. 
¢ Requirements for dispatching from IQ[1] are as follows: 
— Instruction in IQ[0] must dispatch. 
— Instruction dispatched by IQ[O] is not completion- or refetch-serialized. 
— Needed execution unit is available (after dispatch from IQ[O]). 
— Needed GPR rename registers are available (after dispatch from IQ[0]). 
— Needed FPR rename register is available (after dispatch from IQ[O]). 
— Completion queue is not full (after dispatch from IQ[0]). 


6.6.1.3. Completion Unit Resource Requirements 


The following is a list of resources required to avoid stalls in the completion unit; note that 
the two completion entries are described as CQ[O] and CQ[1], where CQ[O] is the 
completion queue located at the end of the completion queue (see Figure 6-4). 


¢ Requirements for completing an instruction from CQ[O] are as follows: 
— Instruction in CQ[0] must be finished. 
— Instruction in CQ[O] must not follow an unresolved predicted branch. 
— Instruction in CQ[O] must not cause an exception. 
¢ Requirements for completing an instruction from CQ[1] are as follows: 
— Instruction in CQ[O] must complete in same cycle. 
— Instruction in CQ[1] must be finished. 
— Instruction in CQ[1] must not follow an unresolved predicted branch. 
— Instruction in CQ[1] must not cause an exception. 
— Instruction in CQ[1] must be an integer or load instruction. 
— Number of CR updates from both CQ[0] and CQ[1] must not exceed two. 
— Number of GPR updates from both CQ[0] and CQ[1] must not exceed two. 
— Number of FPR updates from both CQ[0] and CQ[1] must not exceed two. 
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6.7 Instruction Latency Summary 


Table 6-3 through Table 6-8 list latencies associated with instructions executed by each 
execution unit. Table 6-3 describes branch instruction latencies. 


Table 6-3. Branch Instructions 





Mnemonic | Primary | Extended Latency 





b[l][a] 18 _ Unless these instructions update either the CTR or the LR, branch operations 





are folded if they are either taken or predicted as taken. They fall through if 











bell]fa] 16 — they are not taken or predicted as not taken. 
bectr[l] 19 528 
belr{[l] 19 16 














Table 6-4 lists system register instruction latencies. 


Table 6-4. System Register Instructions 






























































Mnemonic Primary Extended Unit Cycles Serialization 
eieio 31 854 SRU 1 — 
isync 19 150 SRU 2 Completion, refetch 
mfmsr 31 83 SRU 1 — 
mfspr (DBATs) 31 339 SRU 3 Execution 
mfspr (IBATs) 31 339 SRU 3 — 
mfspr (not I/DBATs) 31 339 SRU 1 Execution 
mfsr 31 595 SRU 3 — 
mfsrin 31 659 SRU 3 Execution 
mftb 31 371 SRU 1 —_— 
mtmsr 31 146 SRU 1 Execution 
mtspr (DBATs) 31 467 SRU 2 Execution 
mtspr (IBATs) 31 467 SRU 2 Execution 
mtspr (not I/DBATs) 31 467 SRU 2 Execution 
misr 31 210 SRU 2 Execution 
mtsrin 31 242 SRU 2 Execution 
mitb 31 467 SRU 1 Execution 
rfi 19 50 SRU 2 Completion, refetch 
sc 17 --1 SRU 2 Completion, refetch 
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Table 6-4. System Register Instructions (continued) 
































Mnemonic Primary Extended Unit Cycles Serialization 
sync 31 598 SRU 31 = 
tlbsyne 2 31 566 = = 
Notes: 


1 This assumes no pending stores in the store queue. If there are, the syne completes after they complete to memory. 
If broadcast is enabled on the 60x bus, syne completes only after a successful broadcast. 


2 tlbsync is dispatched only to the completion buffer (not to any execution unit) and is marked finished as it is 
dispatched. Upon retirement, it waits for an external TLBISYNC signal to be asserted. In most systems TLBISYNC 
is always asserted so the instruction is a no-op. 


Table 6-5 lists condition register logical instruction latencies. 


Table 6-5. Condition Register Logical Instructions 









































Mnemonic Primary Extended Unit Cycles Serialization 
crand 19 257 SRU 1 Execution 
crandc 19 129 SRU 1 Execution 
creqv 19 289 SRU 1 Execution 
crnand 19 225 SRU 1 Execution 
crnor 19 33 SRU 1 Execution 
cror 19 449 SRU 1 Execution 
crorc 19 417 SRU 1 Execution 
crxor 19 193 SRU 1 Execution 
merf 19 0 SRU 1 Execution 
merxr 31 512 SRU 1 Execution 
mfcr 31 19 SRU 1 Execution 
mtcrf 31 144 SRU 1 Execution 























Table 6-6 shows integer instruction latencies. Note that the IU1 executes all integer 
arithmetic instructions—multiply, divide, shift, rotate, arithmetic, and compare. The [U2 
executes all integer instructions except multiply and divide (that is, shift, rotate, logical, and 
compare). 


Table 6-6. Integer Instructions 

















Mnemonic Primary Extended Unit Cycles Serialization 
addc[o][.] 31 10 1U1/U2 1 — 
adde[o][.] 31 138 1U1/U2 1 Execution 

addi 14 _ 1U1/1U2 1 — 

addic 12 _— 1U1/1U2 1 —_— 
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Table 6-6. Integer Instructions (continued) 











































































































Mnemonic Primary Extended Unit Cycles Serialization 
addic. 13 _ 1U1/1U2 1 _— 
addis 15 _ 1U1/1U2 1 — 

addmef[o][.] 31 234 1U1/1U2 1 Execution 

addze[o][.] 31 202 1U1/U2 1 Execution 
add[o][.] 31 266 1U1/1U2 1 — 
andc[.] 31 60 1U1/U2 1 — 
andi. 28 _— 1U1/1U2 1 —_— 
andis. 29 _— 1U1/1U2 1 —_— 
and[.] 31 28 1U1/1U2 it _— 
cmp 31 0 1U1/1U2 1 — 
cmpi 11 —_ 1U1/1U2 1 — 
cmpl 31 32 1U1/1U2 1 —_— 
cmpli 10 _— 1U1/1U2 1 —_— 
entlzw[.] 31 26 1U1/1U2 i) — 
divwu[o][.] 31 459 1U1 19 —_— 
divw(o][-] 31 491 1U1 19 — 
eqv[.] 31 284 1U1/1U2 1 —_— 
extsb[.] 31 954 1U1/1U2 1 — 
extsh[.] 31 922 1U1/1U2 1 _— 
mulhwu[.] 31 11 1U1 2,3,4,5,6 —_— 
mulhw/[.] 31 75 1U1 2,3,4,5 — 
mulli 7 _ 1U1 2,3 —_ 
mull[o][.] 31 235 1U1 2,3,4,5 —_— 
nand[.] 31 476 1U1/1U2 1 —_— 
neg[ol|.] 31 104 IU1/U2 1 = 
nor[.] 31 124 1U1/1U2 i) — 
orc[.] 31 412 1U1/1U2 1 —_— 
ori 24 _ 1U1/1U2 1 —_— 
oris 25 _— 1U1/1U2 1 —_— 
or[.] 31 444 1U1/U2 1 —_— 
rlwimi[.] 20 —_ 1U1/U2 1 — 
rlwinm[.] 21 —_— 1U1/1U2 1 —_— 
rlwnn{.] 23 —_ I1U1/U2 1 _ 
slw[.] 31 24 1U1/U2 1 — 
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Table 6-6. Integer Instructions (continued) 






























































Mnemonic Primary Extended Unit Cycles Serialization 

srawi[.] 31 824 1U1/1U2 1 _— 
sraw[.] 31 792 1U1/1U02 1 —_ 
srw[.] 31 536 1U1/1U2 1 —_— 
subfc[o][.] 31 8 1U1/1U2 1 — 

subfe[o][.] 31 136 1U1/1U2 1 Execution 
subfic 8 _ 1U1/1U2 1 —_— 

subfme(o][-] 31 232 1U1/1U2 1 Execution 

subfze[o][-] 31 200 1U1/1U2 1 Execution 
subf[.] 31 40 1U1/1U2 1 —_— 
tw 31 4 1U1/1U2 2 — 
twi 3 _ 1U1/1U2 2 — 
xori 26 —_ 1U1/1U2 1 —_— 
xoris 27 _ 1U1/1U2 1 —_— 
xor([.] 31 316 1U1/U2 1 —_— 





Table 6-7 shows latencies for floating-point instructions. Pipelined floating-point 
instructions are shown with number of clocks in each pipeline stage separated by dashes. 
Floating-point instructions with a single entry in the cycles column are not pipelined; when 
the FPU executes these nonpipelined instructions, it remains busy for the full duration of 
the instruction execution and is not available for subsequent instructions. 


Table 6-7. Floating-Point Instructions 





Mnemonic 
fabs[.] 
faddsj[.] 
fadd[.] 
fempo 
fempu 
fctiwz[.] 
fctiw(.] 
fdivs|.] 
fdiv[.] 
fmadds[.] 
fmadd[.] 





fmr[.] 





Primary 
63 
59 
63 
63 
63 
63 
63 
59 
63 
59 
63 
63 





Extended 
264 
21 
21 
32 
0 
15 
14 
18 
18 
29 
29 
72 





Unit 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 
FPU 





Cycles 
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Table 6-7. Floating-Point Instructions (continued) 






















































































Mnemonic Primary Extended Unit Cycles Serialization 
fmsubs[.] 59 28 FPU 1-11 = 
fmsubf[.] 63 28 FPU 2-1-1 — 
fmuls[.] 59 25 FPU 1-1-1 — 
fmull[.] 63 25 FPU 2-1-1 —_— 
fnabs[.] 63 136 FPU 1-1-1 — 

fneg[.] 63 40 FPU 1-1-1 — 
fnmaddsj[.] 59 31 FPU 1-1-1 — 
fnmadd{[.] 63 31 FPU 2-1-1 —_— 
fnmsubsf[.] 59 30 FPU 1-1-1 —_— 
fnmsub[.] 63 30 FPU 2-1-1 —_— 
fres[.] 59 24 FPU 10 = 
frsp[.] 63 12 FPU 1-1-1 —_— 
frsqrte[.] 63 26 FPU 1-1-1 — 
fsel[.] 63 23 FPU 1-1-1 —_ 
fsubs|.] 59 20 FPU 7 = 
fsub[.] 63 20 FPU 1-1-1 — 
merfs 63 64 FPU 1-1-1 Execution 
mffs[.] 63 583 FPU 1-1-1 Execution 
mtfsbO[.] 63 70 FPU 3 = 
mtfsb1[.] 63 38 FPU 3 — 
mtfsfi[.] 63 134 FPU 3 —_ 
mtfsf[.] 63 711 FPU 3 — 





Table 6-8 shows load and store instruction latencies. Pipelined load/store instructions are 


shown with cycles of total latency and throughput cycles separated by a colon. 


Table 6-8. Load and Store Instructions 





























Mnemonic Primary Extended Unit Cycles Serialization 
debt 31 86 LSU 3:5 | Execution 
debi 31 470 LSU 3:3! Execution 
dcbst 31 54 LSU 3:51 Execution 
dcbt 31 278 LSU 2:1 = 
dcbtst 31 246 LSU 2:1 = 
debz 31 1014 LSU 3:61: 2 Execution 
eciwx 31 310 LSU 2 a 
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Table 6-8. Load and Store Instructions (continued) 











































































































Mnemonic Primary Extended Unit Cycles Serialization 
ecowx 31 438 LSU 2:1 — 
icbi 31 982 LSU 3:41 Execution 
Ibz 34 —_— LSU 2:1 — 
Ibzu 35 — LSU 2:1 —_— 
Ibzux 31 119 LSU 2:1 — 
Ibzx 31 87 LSU 2:1 — 
lfd 50 — LSU 2:1 —_— 
lfdu 51 — LSU 2:1 — 
Ifdux 31 631 LSU 2:1 — 
lfdx 31 599 LSU 2:1 — 
lfs 48 — LSU 2:1 — 
lfsu 49 — LSU 2:1 — 
Ifsux 31 567 LSU 2:1 — 
lfsx 31 535 LSU 2:1 — 
lha 42 — LSU 2:1 — 
Ihau 43 — LSU 2:1 —_— 
Ihaux 31 375 LSU 2:1 — 
lhax 31 343 LSU 2:1 — 
Ihbrx 31 790 LSU 2:1 — 
Ihz 40 — LSU 2:1 — 
Ihzu 4 — LSU 2:1 — 
Ihzux 31 311 LSU 2:1 — 
Ihzx 31 279 LSU 2:1 — 
Imw 46 — LSU 2+n3 Completion, execution 
Iswi 31 597 LSU 2+n8 Completion, execution 
Iswx 31 533 LSU 2+n8 Completion, execution 
lwarx 31 20 LSU 3:1 Execution 
Iwbrx 31 534 LSU 2:1 — 
lwz 32 — LSU 2:1 — 
Iwzu 33 — LSU 2:1 —_— 
lwzux 31 55 LSU 2:1 — 
lwzx 31 23 LSU 2:1 — 
stb 38 _ LSU 2:1 — 
stbu 39 _ LSU 2:1 — 
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Table 6-8. Load and Store Instructions (continued) 








































































































Mnemonic Primary Extended Unit Cycles Serialization 
stbux 31 247 LSU 2:1 — 
stbx 31 215 LSU 2: = 
stfd 54 — LSU 2:1 — 
stfdu 55 = LSU 2: = 
stfdux 31 759 LSU 2:1 — 
stfdx 31 727 LSU 21 a 
stfiwx 31 983 LSU 2:1 — 
stfs 52 — LSU 2:1 = 
stfsu 53 _ LSU 2: — 
stfsux 31 695 LSU 2: = 
stisx 31 663 LSU 2:1 — 
sth 44 = LSU 2:1 = 
sthbrx 31 918 LSU 21 — 
sthu 45 = LSU 2: = 
sthux 31 439 LSU 241 = 
sthx 31 407 LSU 2:1 _ 
stmw 47 = LSU 2+n3 Execution 
stswi 31 725 LSU 2+n3 Execution 
stswx 31 661 LSU 2+n3 Execution 
stw 36 = LSU 2: = 
stwbrx 31 662 LSU 2: — 
stwcx. 31 150 LSU 8:8 Execution 
stwu 37 = LSU 2:1 — 
stwux 31 183 LSU 2: = 
stwx 31 151 LSU 2:1 — 
tlbie 31 306 LSU 3:41 Execution 
Notes: 


1 For cache-ops, the rst n umber indicates the latency in nishing a single instr uction; the second indicates the 


throughput for back-to-back cache-ops. Throughput may be larger than the initial latency as more cycles may be 
needed to complete the instruction to the cache, which stays busy keeping subsequent cache-ops from executing. 
2 The throughput number of 6 cycles for dcbz assumes it is to nonglobal (M = 0) address space. For global address 
space, throughput is at least 11 cycles. 
3 Load/store multiple/string instruction cycles are represented as a x ed number of cycles plus a variable number of 
cycles, where nis the number of words accessed by the instruction. 


Chapter 6. Instruction Timing 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Instruction Latency Summary 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 


Chapter 7 
Signal Descriptions 


This chapter describes the MPC750 microprocessor’s external signals. It contains a concise 
description of individual signals, showing behavior when the signal is asserted and negated 
and when the signal is an input and an output. Note that the MPC755 microprocessor is a 
derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 
except as noted in Appendix C, “MPC755 Embedded G3 Microprocessor.” 


NOTE 


A bar over a signal name indicates that the signal is active 
low—for example, ARTRY (address retry) and TS (transfer 
start). Active-low signals are referred to as asserted (active) 
when they are low and negated when they are high. Signals that 
are not active low, such as AP[Q—3] (address bus parity signals) 
and TT[0-4] (transfer type signals) are referred to as asserted 
when they are high and negated when they are low. 





The MPC750 signals are grouped as follows: 


Address arbitration—The MPC750 uses these signals to arbitrate for address bus 
mastership. 

Address transfer start—These signals indicate that a bus master has begun a 
transaction on the address bus. 

Address transfer—These signals include the address bus and address parity signals. 
They are used to transfer the address and to ensure the integrity of the transfer. 
Transfer attribute—These signals provide information about the type of transfer, 
such as the transfer size and whether the transaction is bursted, write-through, or 
cache-inhibited. 

Address transfer termination—These signals are used to acknowledge the end of the 
address phase of the transaction. They also indicate whether a condition exists that 
requires the address phase to be repeated. 

Data arbitration—The MPC750 uses these signals to arbitrate for data bus 
mastership. 


Data transfer—These signals, which consist of the data bus and data parity, are used 
to transfer the data and to ensure the integrity of the transfer. 
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Signal Configuration 


7.1 


Data transfer termination—Data termination signals are required after each data 
beat in a data transfer. In a single-beat transaction, the data termination signals also 
indicate the end of the tenure; while in burst accesses, the data termination signals 
apply to individual beats and indicate the end of the tenure only after the final data 
beat. They also indicate whether a condition exists that requires the data phase to be 
repeated. 

L2 cache address/data—The MPC750 has separate address and data buses for 
accessing the L2 cache (not supported in the MPC740). 

L2 cache clock/control—These signals provide clocking and control for the L2 
cache (not supported in the MPC740). 

Interrupts/resets—These signals include the external interrupt signal, checkstop 
signals, and both soft reset and hard reset signals. They are used to interrupt and, 
under various conditions, to reset the processor. 

Processor status and control—These signals are used to set the reservation 
coherency bit, enable the time base, and other functions. They are also used in 
conjunction with such resources as secondary caches and the time base facility. 
Clock control—These signals determine the system clock frequency. They can also 
be used to synchronize multiprocessor systems. 

Test interface—The JTAG (IEEE 1149.1a-1993) interface and the common on-chip 


processor (COP) unit provide a serial interface to the system for performing 
board-level boundary-scan interconnect tests. 


Signal Configuration 


Figure 7-1 illustrates the MPC750’s signal configuration, showing how the signals are 
grouped. A pinout showing pin numbers is included in the MPC750_ hardware 
specifications. 
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Figure 7-1. MPC750 Signal Groups 


7.2 Signal Descriptions 


This section describes individual MPC750 signals, grouped according to Figure 7-1. Note 
that the following sections summarize signal functions. Chapter 8, “System Interface 
Operation,” describes many of these signals in greater detail, both with respect to how 
individual signals function and how groups of signals interact. 
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7.2.1. Address Bus Arbitration Signals 


The address arbitration signals are input and output signals the MPC750 uses to request the 
address bus, recognize when the request is granted, and indicate to other devices when 
mastership is granted. For a detailed description of how these signals interact, see 
Section 8.3.1, “Address Bus Arbitration.” 


7.2.1.1 Bus Request (BR)—Output 


Following are the state meaning and timing comments for the BR output signal. 


State Meaning 


Timing Comments 


Asserted—Indicates that the MPC750 is requesting mastership of 
the address bus. Note that BR may be asserted for one or more 
cycles, and then de-asserted due to an internal cancellation of the bus 
request (for example, due to a load hit in the touch load buffer). See 
Section 8.3.1, “Address Bus Arbitration.” 


Negated—Indicates that the MPC750 is not requesting the address 
bus. The MPC750 may have no bus operation pending, it may be 
parked, or the ARTRY input was asserted on the previous bus clock 
cycle. 


Assertion—Occurs when the MPC750 is not parked and a bus 
transaction is needed. This may occur even if the two possible 
pipeline accesses have occurred. BR will also be asserted for one 
cycle during the execution of a dcbz instruction, and during the 
execution of a load instruction which hits in the touch load buffer. 





Negation—Occurs for at least one bus clock cycle after an accepted, 
qualified bus grant (see BG and ABB), even if another transaction is 
pending. It is also negated for at least one bus clock cycle when the 
assertion of ARTRY is detected on the bus. 








7.2.1.2 Bus Grant (BG)—Input 


Following are the state meaning and timing comments for the BG input signal. 


State Meaning 


Asserted—Indicates that the MPC750 may, with proper 
qualification, assume mastership of the address bus. A qualified bus 
grant occurs when BG is asserted and ABB and ARTRY are not 
asserted the bus cycle following the assertion of AACK. The ABB 
and ARTRY signals are driven by the MPC750 or other bus masters. 
If the MPC750 is parked, BR need not be asserted for the qualified 
bus grant. See Section 8.3.1, “Address Bus Arbitration.” 


Negated— Indicates that the MPC750 is not the next potential 
address bus master. 
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Assertion—May occur at any time to indicate the MPC750 can use 
the address bus. After the MPC750 assumes bus mastership, it does 
not check for a qualified bus grant again until the cycle during which 
the address bus tenure completes (assuming it has another 
transaction to run). The MPC750 does not accept a BG in the cycles 
between the assertion of any TS and AACK. 





Negation—May occur at any time to indicate the MPC750 cannot 
use the bus. The MPC750 may still assume bus mastership on the bus 
clock cycle of the negation of BG because during the previous cycle 
BG indicated to the MPC750 that it could take mastership (if 
qualified). 


7.2.1.3 Address Bus Busy (ABB) 
The address bus busy (ABB) signal is both an input and an output signal. 


7.2.1.3.1. Address Bus Busy (ABB)—Output 


Following are the state meaning and timing comments for the ABB output signal. 


State Meaning 


Timing Comments 


Asserted—Indicates that the MPC750 is the address bus master. See 
Section 8.3.1, “Address Bus Arbitration.” 


Negated—Indicates that the MPC750 is not using the address bus. If 
ABB is negated during the bus clock cycle following a qualified bus 
grant, the MPC750 did not accept mastership even if BR was 
asserted. This can occur if a potential transaction is aborted 
internally before the transaction begins. 


Assertion—Occurs on the bus clock cycle following a qualified BG 
that is accepted by the processor (see Negated). 


Negation—Occurs for a minimum of one-half bus clock cycle 
following the assertion of AACK. If ABB is negated during the bus 
clock cycle after a qualified bus grant, the MPC750 did not accept 
mastership, even if BR was asserted. 








High Impedance—Occurs after ABB is negated. 


7.2.1.3.2 Address Bus Busy (ABB)—Input 


Following are the state meaning and timing comments for the ABB input signal. 


State Meaning 


Asserted—Indicates that the address bus is in use. This condition 
effectively blocks the MPC750 from assuming address bus 
ownership, regardless of the BG input; see Section 8.3.1, “Address 
Bus Arbitration.” 
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Negated—Indicates that the address bus is not owned by another bus 
master and that it is available to the MPC750 when accompanied by 
a qualified bus grant. 


Timing Comments Assertion—May occur when the MPC750 must be kept from using 
the address bus (and the processor is not currently asserting ABB). 





Negation—May occur whenever the MPC750 can use the address 
bus. 


7.2.2 Address Transfer Start Signals 


Address transfer start signals are input and output signals that indicate that an address bus 
transfer has begun. The transfer start (TS) signal identifies the operation as a memory 
transaction. 


For detailed information about how TS interacts with other signals, refer to Section 8.3.2, 
“Address Transfer.” 


7.2.2.1. Transfer Start (TS) 
The TS signal is both an input and an output signal on the MPC750. 


7.2.2.1.1. Transfer Start (TS)—Output 
Following are the state meaning and timing comments for the TS output signal. 


State Meaning Asserted—Indicates that the MPC750 has begun a memory bus 
transaction and that the address bus and transfer attribute signals are 
valid. When asserted with the appropriate TT[O—4] signals it is also 
an implied data bus request for a memory transaction (unless it is an 
address-only operation). 


Negated—Indicates that no bus transaction is occurring during 
normal operation. 


Timing Comments Assertion—Coincides with the assertion of ABB. 
Negation—Occurs one bus clock cycle after TS is asserted. 
High Impedance—Coincides with the negation of ABB. 


7.2.2.1.2 Transfer Start (TS)—Input 


Following are the state meaning and timing comments for the TS input signal. 


State Meaning Asserted—Indicates that another master has begun a bus transaction 
and that the address bus and transfer attribute signals are valid for 
snooping (see GBL). 





Negated—Indicates that no bus transaction is occurring. 
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Timing Comments Assertion—May occur during the assertion of ABB. 
Negation—Miust occur one bus clock cycle after TS is asserted. 


7.2.3 Address Transfer Signals 


The address transfer signals are used to transmit the address and to generate and monitor 
parity for the address transfer. For a detailed description of how these signals interact, refer 
to Section 8.3.2, “Address Transfer.” 


7.2.3.1. Address Bus (A[0-31]) 
The address bus (A[0-31]) consists of 32 signals that are both input and output signals. 


7.2.3.1.1. Address Bus (A[0-—31])—Output 


Following are the state meaning and timing comments for the A[0-31] output signals. 


State Meaning Asserted/Negated—Represents the physical address (real address in 
the architecture specification) of the data to be transferred. On burst 
transfers, the address bus presents the double-word-aligned address 
containing the critical code/data that missed the cache on a read 
operation, or the first double word of the cache line on a write 
operation. Note that the address output during burst operations is not 
incremented. See Section 8.3.2, “Address Transfer.” 


Timing Comments Assertion/Negation—Occurs on the bus clock cycle after a qualified 
bus grant (coincides with assertion of ABB and TS). 





High Impedance—Occurs one bus clock cycle after AACK is 
asserted. 


7.2.3.1.2 Address Bus (A[0-31])—Input 
Following are the state meaning and timing comments for the A[0-31] input signals. 


State Meaning Asserted/Negated—Represents the physical address of a snoop 
operation. 


Timing Comments Assertion/Negation—Must occur on the same bus clock cycle as the 
assertion of TS; is sampled by MPC750 only on this cycle. 


7.2.3.2 Address Bus Parity (AP[0-3]) 


The address bus parity (AP[O—3]) signals are both input and output signals reflecting one 
bit of odd-byte parity for each of the 4 bytes of address when a valid address is on the bus. 
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7.2.3.2.1_ Address Bus Parity (AP[0-—3])—Output 


Following are the state meaning and timing comments for the AP[0—3] output signals on 
the MPC750. 


State Meaning Asserted/Negated—Represents odd parity for each of the 4 bytes of 
the physical address for a transaction. Odd parity means that an odd 
number of bits, including the parity bit, are driven high. The signal 
assignments correspond to the following: 


APO A[0-7] 
API A[8-15] 
AP2 A[16—23] 
AP3 A[24—-31] 


For more information, see Section 8.3.2.1, “Address Bus Parity.” 


Timing Comments Assertion/Negation—The same as A[0-31]. 
High Impedance—The same as A[0-31]. 


7.2.3.2.2 Address Bus Parity (AP[0—3])—Input 


Following are the state meaning and timing comments for the AP[O-3] input signal on the 
MPC750. 


State Meaning Asserted/Negated—Represents odd parity for each of the 4 bytes of 
the physical address for snooping operations. Detected even parity 
causes the processor to take a machine check exception or enter the 
checkstop state if address parity checking is enabled in the HIDO 
register; see Section 2.1.2.2, “Hardware Implementation-Dependent 
Register 0.” 


Timing Comments Assertion/Negation—The same as A[0-31]. 


7.2.4 Address Transfer Attribute Signals 


The transfer attribute signals are a set of signals that further characterize the transfer—such 
as the size of the transfer, whether it is a read or write operation, and whether it is a burst 
or single-beat transfer. For a detailed description of how these signals interact, see 
Section 8.3.2, “Address Transfer.” 


Note that some signal functions vary depending on whether the transaction is a memory 
access or an I/O access. 


7.2.4.1. Transfer Type (TT[0—4]) 


The transfer type (TT[O-4]) signals consist of five input/output signals on the MPC750. For 
a complete description of TT[0—4] signals and for transfer type encodings, see Table 7-1. 
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7.2.4.1.1. Transfer Type (TT[O—4])—Output 


Following are the state meaning and timing comments for the TT[O—4] output signals on 
the MPC750. 


State Meaning Asserted/Negated—Indicates the type of transfer in progress. 


Timing Comments Assertion/Negation/High Impedance—The same as A[0-—31]. 


7.2.4.1.2 Transfer Type (TT[O—4])—Input 


Following are the state meaning and timing comments for the TT[O—4] input signals on the 


MPC750. 


State Meaning 


Table 7-2). 


Timing Comments Assertion/Negation—The same as A[0-31]. 


Table 7-1 describes the transfer encodings for an MPC750 bus master. 
Table 7-1. Transfer Type Encodings for MPC750 Bus Master 


Asserted/Negated—Indicates the type of transfer in progress (see 



























































MPC750 Bus Tansachan 60x Bus 
Master TTO | TT1 | TT2 | TT3 | TT4 Specification Transaction 
; Source 
Transaction Command 

Address only’ dcbst 0 0 0 0 0 | Clean block Address only 

Address only’ dcbf 0 0 1 0 0 _| Flush block Address only 

Address only’ sync 0 1 0 0 0 |sync Address only 

Address only’ dcbz or debi 0 1 1 0 0 | Kill block Address only 

Address only’ eieio 1 0 0 0 0 |eieio Address only 

Single-beat write | ecowx 1 0 1 0 0 | Externalcontrolword | Single-beat 

(nonGBL) write write 

N/A N/A 1 1 0 0 0 | TLB invalidate Address only 

Single-beat read | eciwx 1 1 1 0 0 | Externalcontrolword | Single-beat read 

(nonGBL) read 

N/A N/A 0 0 0 0 1 lwarx Address only 

reservation set 

N/A N/A 0 0 1 0 1 Reserved _— 

N/A N/A 0 1 0 0 1 tlbsync Address only 

N/A N/A 0 1 1 0 1 icbi Address only 

N/A N/A 1 X X 0 1 Reserved — 

Single-beat write | Caching-inhibited 0 1 0 | Write-with- ush Single-beat 
or write-through write or burst 
store 

Burst (nonGBL) | Cast-out, or snoop 0 0 1 1 OQ | Write-with-kill Burst 
copyback 
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Table 7-1. Transfer Type Encodings for MPC750 Bus Master (continued) 















































MPC750 Bus Fiancacdon 60x Bus 
Master TTO | TT1 | TT2 | TT3 | TT4 Specification Transaction 
: Source 
Transaction Command 
Single-beat read | Caching-inhibited 0 1 0 1 0 |Read Single-beat read 
load or instruction or burst 
fetch 
Burst Load miss, store 0 1 1 1 0 Read-with-intent-to- | Burst 
miss, or instruction modify 
fetch 
Single-beat write | stwex. 1 0 0 1 0 |Write-with- ush-ato | Single-beat 
mic write 
N/A N/A 1 0 1 1 0 | Reserved N/A 
Single-beat read | lwarx 1 1 0 1 0 | Read-atomic Single-beat read 
(caching-inhibited or burst 
load) 
Burst lwarx 1 1 1 1 0 Read-with-intent-to- | Burst 
(load miss) modify-atomic 
N/A N/A 0 0 0 1 1 Reserved —_— 
N/A N/A 0 0 ‘t 1 1 Reserved — 
N/A N/A 0 1 0 1 ff Read-with-no-intent-t | Single-beat read 
o-cache or burst 
N/A N/A 0 1 1 1 1 Reserved — 
N/A N/A 1 Xx Xx 1 1 Reserved —_— 











Note: ‘Address-only transaction occurs if enabled by setting HIDO[ABE] bit to 1. 


Table 7-2 describes the 60x bus specification transfer encodings and the MPC750 bus 
snoop response on an address hit. 


Table 7-2. MPC750 Snoop Hit Response 















































60x Bus Specification MPC oeus 
Transaction TTO | TT1 | TT2 | TTS | TT4 Snooper; 

Command ; ; 

Action on Hit 
Clean block Address only 0 0 0 0 0 |N/A 
Flush block Address only 0 0 1 0 0 |N/A 
sync Address only 0 1 0 0 0 |N/A 

Kill block Address only 0 1 1 0 0 Flush, cancel 
reservation 

eieio Address only 1 0 0 0 0 |N/A 
External control word write Single-beat write 1 0 1 0 0 |N/A 
TLB Invalidate Address only i) 1 0 0 0 |NA 
External control word read Single-beat read 1 1 1 0 0 |N/A 
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Table 7-2. MPC750 Snoop Hit Response (continued) 



























































60x Bus Specification MEC OU EUS 
Transaction TTO | TT1 | TT2 | TTS | TT4 Snooper; 
Command . ‘ 
Action on Hit 
lwarx Address only 0 0 0 0 1 N/A 
reservation set 
Reserved —_— 0 0 1 0 1 N/A 
tlbsync Address only 0 1 0 0 1 N/A 
icbi Address only 0 1 t 0 1 N/A 
Reserved —_— 1 x x 0 1 N/A 
Write-with- ush Single-beat write or burst 0 1 0 Flush, cancel 
reservation 
Write-with-kill Single-beat write or burst 0 0 1 1 0 Kill, cancel 
reservation 
Read Single-beat read or burst 0 1 0 1 0 |Clean or ush 
Read-with-intent-to-modify Burst 0 1 1 1 QO |Flush 
Write-with- ush-atomic Single-beat write 1 0 0 1 0 Flush, cancel 
reservation 
Reserved N/A 1 0 1 1 0 |N/A 
Read-atomic Single-beat read or burst 1 1 0 1 0 |Clean or ush 
Read-with-intent-to Burst 1 1 1 1 0 Flush 
modify-atomic 
Reserved —_— 0 0 0 1 1 N/A 
Reserved — 0 0 1 1 1 N/A 
Read-with-no-intent-to-cache | Single-beat read or burst 0 1 0 1 1 Clean 
Reserved —_— 0 1 1 1 1 N/A 
Reserved —_— 1 x x 1 1 N/A 




















7.2.4.2 Transfer Size (TSIZ[0-—2])—Output 


Following are the state meaning and timing comments for the transfer size (TSIZ[O-2]) 
output signals on the MPC750. 


State Meaning 


Asserted/Negated—For memory accesses, these signals along with 
TBST, indicate the data transfer size for the current bus operation, as 
shown in Table 7-3. Table 8-3 shows how the transfer size signals are 
used with the address signals for aligned transfers. Table 8-4 shows 
how the transfer size signals are used with the address signals for 
misaligned transfers. Note that the MPC750 does not generate all 


possible TSIZ[0—2] encodings. 
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For external control instructions (eciwx and ecowx), TSIZ[0—2] are 
used to output bits 29-31 of the external access register (EAR), 
which are used to form the resource ID (TBSTIITSIZO-TSIZ2). 


Timing Comments _ Assertion/Negation—The same as A[0—31]. 
High Impedance—The same as A[0-31]. 


Table 7-3. Data Transfer Size 
































TBST TSIZ[O0-2] Transfer Size 
Asserted 010 Burst (32 bytes) 
Negated 000 8 bytes 
Negated 001 1 byte 
Negated 010 2 bytes 
Negated 011 3 bytes 
Negated 100 4 bytes 
Negated 101 5 bytes! 
Negated 110 6 bytes" 
Negated 111 7 bytes! 

















Note: ‘Not generated by MPC750. 


7.2.4.3 Transfer Burst (TBST) 
The transfer burst (TBST) signal is an input/output signal on the MPC750. 


7.2.4.3.1. Transfer Burst (TBST)—Output 
Following are the state meaning and timing comments for the TBST output signal. 
State Meaning Asserted—Indicates that a burst transfer is in progress. 


Negated—Indicates that a burst transfer is not in progress. 





For external control instructions (eciwx and ecowx), TBST is used to 
output bit 28 of the EAR, which is used to form the resource ID 
(TBSTIITSIZO-TSIZ2). 


Timing Comments Assertion/Negation—The same as A[0—31]. 
High Impedance—The same as A[0-31]. 


7.2.4.3.2 Transfer Burst (TBST)—Input 
Following are the state meaning and timing comments for the TBST input signal. 


State Meaning Asserted/Negated—Used when snooping for single-beat reads (read 
with no intent to cache). 


Timing Comments _ Assertion/Negation—The same as A[0—31]. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Signal Descriptions 


7.2.4.4 Cache Inhibit (Cl)—Output 
The cache inhibit (CI) signal is an output signal on the MPC750. Following are the state 
meaning and timing comments for the CI signal. 


State Meaning Asserted—Indicates that a single-beat transfer will not be cached, 
reflecting the setting of the I bit for the block or page that contains 
the address of the current transaction. 


Negated—Indicates that a burst transfer will allocate an MPC750 
data cache block. 


Timing Comments Assertion/Negation—The same as A[0-31]. 
High Impedance—The same as A[0-31]. 


7.2.4.5 Write-Through (WT)—Output 


The write-through (WT) signal is an output signal on the MPC750. Following are the state 
meaning and timing comments for the WT signal. 


State Meaning Asserted—Indicates that a single-beat write transaction is 
write-through, reflecting the value of the W bit for the block or page 
that contains the address of the current transaction. Assertion during 
a read operation indicates instruction fetching. 


Negated—Indicates that a write transaction is not write-through; 
during a read operation negation indicates a data load. 


Timing Comments Assertion/Negation—The same as A[0-31]. 
High Impedance—The same as A[0-31]. 


7.2.4.6 Global (GBL) 
The global (GBL) signal is an input/output signal on the MPC750. 


7.2.4.6.1| Global (GBL)—Output 
Following are the state meaning and timing comments for the GBL output signal. 


State Meaning Asserted—Indicates that a transaction is global, reflecting the setting 
of the M bit for the block or page that contains the address of the 
current transaction (except in the case of copy-back operations and 
instruction fetches, which are nonglobal.) 


Negated—Indicates that a transaction is not global. 


Timing Comments Assertion/Negation—The same as A[0-31]. 
High Impedance—The same as A[0-31]. 
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7.2.4.6.2 Global (GBL)—Input 


Following are the state meaning and timing comments for the GBL input signal. 


State Meaning 


Timing Comments 


Asserted—Indicates that a transaction must be snooped by the 
MPC750. 


Negated—Indicates that a transaction is not snooped by the 
MPC750. 


Assertion/Negation—The same as A[0-31]. 


7.2.5 Address Transfer Termination Signals 


The address transfer termination signals are used to indicate either that the address phase 
of the transaction has completed successfully or must be repeated, and when it should be 
terminated. For detailed information about how these signals interact, see Section 8.3.3, 
“Address Transfer Termination.” 


7.2.5.1 Address Acknowledge (AACK)—Input 


The address acknowledge (AACK) signal is an input-only signal on the MPC750. 
Following are the state meaning and timing comments for the AACK signal. 


State Meaning 


Timing Comments 





Asserted—Indicates that the address phase of a transaction is 
complete. The address bus will go to a high-impedance state on the 
next bus clock cycle. The MPC750 samples ARTRY on the bus clock 
cycle following the assertion of AACK. 


Negated—(During ABB) indicates that the address bus and the 
transfer attributes must remain driven. 








Assertion—May occur as early as the bus clock cycle after TS is 
asserted; assertion can be delayed to allow adequate address access 
time for slow devices. For example, if an implementation supports 
slow snooping devices, an external arbiter can postpone the assertion 
of AACK. 


Negation—Miust occur one bus clock cycle after the assertion of 
AACK. 








7.2.5.2 Address Retry (ARTRY) 
The address retry (ARTRY) signal is both an input and output signal on the MPC750. 


7.2.5.2.1. Address Retry (ARTRY)—Output 


Following are the state meaning and timing comments for the ARTRY output signal. 
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Asserted—Indicates that the MPC750 detects a condition in which a 
snooped address tenure must be retried. If the MPC750 needs to 
update memory as a result of the snoop that caused the retry, the 
MPC750 asserts BR the second cycle after AACK if ARTRY is 
asserted. 


High Impedance—Indicates that the MPC750 does not need the 
snooped address tenure to be retried. 








Assertion—Asserted the second bus cycle following the assertion of 
TS if a retry is required. 


Negation/HighZ—Driven until the bus_clk cycle following the 
assertion of AACK. Because this signal may be simultaneously 
driven by multiple devices, it negates in a unique fashion. First the 
buffer goes to high impedance for a minimum of one-half processor 
cycle (dependent on the clock mode), then it is driven negated for 
one-half bus cycle before returning to high impedance. 





This special method of negation may be disabled by setting 
precharge disable in HIDO. 


7.2.5.2.2 Address Retry (ARTRY)—Input 


Following are the state meaning and timing comments for the ARTRY input signal. 


State Meaning 


Timing Comments 


Asserted—If the MPC750 is the address bus master, ARTRY 
indicates that the MPC750 must retry the preceding address tenure 
and immediately negate BR (if asserted). If the associated data 
tenure has already started, the MPC750 also aborts the data tenure 
immediately, even if the burst data has been received. If the MPC750 
is not the address bus master, this input indicates that the MPC750 
should immediately negate BR to allow an opportunity for a 
copy-back operation to main memory after a snooping bus master 
asserts ARTRY. Note that the subsequent address presented on the 
address bus may not be the same one associated with the assertion of 
the ARTRY signal. 


Negated/High Impedance—Indicates that the MPC750 does not 
need to retry the last address tenure. 








Assertion—May occur as early as the second cycle following the 
assertion of TS, and must occur by the bus clock cycle immediately 
following the assertion of AACK if an address retry is required. 


Negation—Must occur two bus clock cycles after the assertion of 
AACK. 
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7.2.6 Data Bus Arbitration Signals 


Like the address bus arbitration signals, data bus arbitration signals maintain an orderly 
process for determining data bus mastership. Note that there is no data bus arbitration signal 
equivalent to the address bus arbitration signal BR (bus request), because, except for 
address-only transactions, TS implies data bus requests. For a detailed description on how 
these signals interact, see Section 8.4.1, “Data Bus Arbitration.” 


One special signal, DB WO, allows the MPC750 to be configured dynamically to write data 
out of order with respect to read data. For detailed information about using DBWO, see 
Section 8.10, “Using Data Bus Write Only.” 





7.2.6.1. Data Bus Grant (DBG)—Input 


The data bus grant (DBG) signal is an input-only signal on the MPC750. Following are the 
state meaning and timing comments for the DBG signal. 





State Meaning Asserted—Indicates that the MPC750 may, with the proper 
qualification, assume mastership of the data bus. The MPC750 
derives a qualified data bus grant when DBG is asserted and DBB, 
DRTRY, and ARTRY are negated; that is, the data bus is not busy 
(DBB is negated), there is no outstanding attempt to retry the current 
data tenure (DRTRY is negated), and there is no outstanding attempt 
to perform an ARTRY of the associated address tenure. 


Negated—Indicates that the MPC750 must hold off its data tenures. 

















Timing Comments Assertion—May occur any time to indicate the MPC750 is free to 
take data bus mastership. It is not sampled until TS is asserted. 


Negation—May occur at any time to indicate the MPC750 cannot 
assume data bus mastership. 


7.2.6.2 Data Bus Write Only (DBWO)—Input 


The data bus write only (DBWO) signal is an input-only signal on the MPC750. Following 
are the state meaning and timing comments for the DBWO signal. 


State Meaning Asserted—Indicates that the MPC750 may run the data bus tenure 
for an outstanding write address even if a read address is pipelined 
before the write address. Refer to Section 8.10, “Using Data Bus 
Write Only,” for detailed instructions for using DBWO. 


Negated—Indicates that the MPC750 must run the data bus tenures 
in the same order as the address tenures. 








Timing Comments Assertion—Must occur no later than a qualified DBG for an 
outstanding write tenure. DBWO is sampled by the MPC750 on the 
clock of a qualified DBG. If no write requests are pending, the 
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MPC750 will ignore DBWO and assume data bus ownership for the 
next pending read request. 


Negation—May occur any time after a qualified DBG and before the 
next assertion of DBG. 





7.2.6.3 Data Bus Busy (DBB) 
The data bus busy (DBB) signal is both an input and output signal on the MPC750. 


7.2.6.3.1_ Data Bus Busy (DBB)—Output 


Following are the state meaning and timing comments for the DBB output signal. 


State Meaning Asserted—Indicates that the MPC750 is the data bus master. The 
MPC750 always assumes data bus mastership if it needs the data bus 
and is given a qualified data bus grant (see DBG). 


Negated—Indicates that the MPC750 is not using the data bus. 





Timing Comments Assertion—Occurs during the bus clock cycle following a qualified 
DBG. 


Negation—Occurs for a minimum of one-half bus clock cycle __ 
(dependent on clock mode) following the assertion of the final TA. 





High Impedance—Occurs after DBB is negated. 


7.2.6.3.2 Data Bus Busy (DBB)—Input 


Following are the state meaning and timing comments for the DBB input signal. 


State Meaning Asserted—Indicates that another device is bus master. 
Negated—Indicates that the data bus is free (with proper 
qualification, see DBG) for use by the MPC750. 


Timing Comments Assertion—Must occur when the MPC750 must be prevented from 
using the data bus. 





Negation—May occur whenever the data bus is available. 


7.2. Data Transfer Signals 


Like the address transfer signals, the data transfer signals are used to transmit data and to 
generate and monitor parity for the data transfer. For a detailed description of how the data 
transfer signals interact, see Section 8.4.3, “Data Transfer.” 
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7.2.7.1. Data Bus (DH[0—31], DL[O-31]) 


The data bus (DH[0-3]1 and DL[0-31]) consists of 64 signals that are both inputs and 
outputs on the MPC750. Following are the state meaning and timing comments for the DH 
and DL signals. 


State Meaning The data bus has two halves—data bus high (DH) and data bus low 
(DL). See Table 7-4 for the data bus lane assignments. 


Timing Comments The data bus is driven once for noncached transactions and four 
times for cache transactions (bursts). 


Table 7-4. Data Bus Lane Assignments 





Data Bus Signals Byte Lane 





DH[0-7] 0 
DH[8-15] 1 
DH[16-23] 











DH[24-31] 





DL[0-7] 





DL[8-15] 





DL[16—23] 








NN] OO] oO] BY] WOW] DY 


DL[24-31] 





7.2.7.1.1.| Data Bus (DH[0-31], DL[O—31])—Output 
Following are the state meaning and timing comments for the DH and DL output signals. 


State Meaning Asserted/Negated—Represents the state of data during a data write. 
Byte lanes not selected for data transfer will not supply valid data. 


Timing Comments Assertion/Negation—Initial beat coincides with DBB and, for 
bursts, transitions on the bus clock cycle following each assertion of 
TA. 


High Impedance—Occurs on the bus clock cycle after the final 
assertion of TA, following the assertion of TEA, or in certain ARTRY 
cases. 





7.2.7.1.2 Data Bus (DH[0-31], DL[O—31])—Input 
Following are the state meaning and timing comments for the DH and DL input signals. 


State Meaning Asserted/Negated—Represents the state of data during a data read 
transaction. 


Timing Comments Assertion/Negation—Data must be valid on the same bus clock cycle 
that TA is asserted. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Signal Descriptions 


7.2.7.2 Data Bus Parity (DP[0-7]) 


The eight data bus parity (DP[O—7]) signals on the MPC750 are both output and input 
signals. 


7.2.7.2.1_| Data Bus Parity (DP[0O—7])—Output 


Following are the state meaning and timing comments for the DP output signals. 


State Meaning Asserted/Negated—Represents odd parity for each of the 8 bytes of 
data write transactions. Odd parity means that an odd number of bits, 
including the parity bit, are driven high. The generation of parity is 
enabled through HIDO. The signal assignments are listed in 
Table 7-5. 


Timing Comments Assertion/Negation—The same as DL[0-31]. 
High Impedance—The same as DL[0-31]. 


Table 7-5. DP[0-7] Signal Assignments 





Signal Name Signal Assignments 


DL[8—15] 
D DL[16—23] 


D DL[24-31] 


7.2.7.2.2 Data Bus Parity (DP[0O—7])—Input 


Following are the state meaning and timing comments for the DP input signals. 


PO 
P1 
P2 
DP3 
DP5 
P6 
P7 








State Meaning Asserted/Negated—Represents odd parity for each byte of read data. 
Parity is checked on all data byte lanes, regardless of the size of the 
transfer. Detected even parity causes a checkstop if data parity errors 
are enabled in the HIDO register. 


Timing Comments Assertion/Negation—The same as DL[0-31]. 


7.2.7.3 Data Bus Disable (DBDIS)—Input 


Following are the state meaning and timing comments for the DBDIS signal. 


State Meaning Asserted—lIndicates (for a write transaction) that the MPC750 must 
release the data bus and the data bus parity to high impedance during 
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the following cycle. The data tenure remains active, DBB remains 
driven, and the transfer termination signals are still monitored by the 
MPC750. 


Negated—Indicates the data bus should remain normally driven. 
DBDIS is ignored during read transactions. 





Timing Comments Assertion/Negation—May be asserted on any clock cycle when the 
MPC750 is driving or will be driving the data bus; may remain 
asserted multiple cycles. 


7.2.8 Data Transfer Termination Signals 


Data termination signals are required after each data beat in a data transfer. Note that in a 
single-beat transaction, the data termination signals also indicate the end of the tenure, 
while in burst accesses, the data termination signals apply to individual beats and indicate 
the end of the tenure only after the final data beat. 


For a detailed description of how these signals interact, see Section 8.4.4, “Data Transfer 
Termination.” 


7.2.8.1. Transfer Acknowledge (TA)—Input 


Following are the state meaning and timing comments for the TA signal. 


State Meaning Asserted— Indicates that a single-beat data transfer completed 
successfully or that a data beat in a burst transfer completed 
successfully (unless DRTRY is asserted on the next bus clock cycle). 
Note that TA must be asserted for each data beat in a burst 
transaction and must be asserted during assertion of DRTRY. For 
more information, see Section 8.4.4, “Data Transfer Termination.” 


Negated—(During DBB) indicates that, until TA is asserted, the 
MPC750 must continue to drive the data for the current write or must 
wait to sample the data for reads. 


Timing Comments Assertion—Miust not occur before AACK for the current transaction 
(if the address retry mechanism is to be used to prevent invalid data 
from being used by the processor); otherwise, assertion may occur at 
any time during the assertion of DBB. The system can withhold 
assertion of TA to indicate that the MPC750 should insert wait states 
to extend the duration of the data beat. 





Negation—Miust occur after the bus clock cycle of the final (or only) 
data beat of the transfer. For a burst transfer, the system can assert TA 
for one bus clock cycle and then negate it to advance the burst 
transfer to the next beat and insert wait states during the next beat. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Signal Descriptions 


7.2.8.2 Data Retry (DRTRY)—Input 


Following are the state meaning and timing comments for the DRTRY signal. 


State Meaning 


Timing Comments 


Asserted—Indicates that the MPC750 must invalidate the data from 
the previous read operation. 


Negated—Indicates that data presented with TA on the previous read 
operation is valid. Note that DRTRY is ignored for write 
transactions. 





Assertion—Must occur during the bus clock cycle immediately after 
TA is asserted if a retry is required. The DRTRY signal may be held 
asserted for multiple bus clock cycles. When DRTRY is negated, 
data must have been valid on the previous clock with TA asserted. 








Negation—Miust occur during the bus clock cycle after a valid data 
beat. This may occur several cycles after DBB is negated, effectively 
extending the data bus tenure. 





Start-up—The DRTRY signal is sampled at the negation of 
HRESET; if DRTRY is asserted, no-DRTRY mode is selected. If 
DRTRY is negated at start-up, DRTRY is enabled. 








7.2.8.3 Transfer Error Acknowledge (TEA)—Input 


Following are the state meaning and timing comments for the TEA signal. 


State Meaning 


Timing Comments 


Asserted—Indicates that a bus error occurred. Causes a machine 
check exception (and possibly causes the processor to enter 
checkstop state if machine check enable bit is cleared 

(MSR[ME] = 0)). For more information, see Section 4.5.2.2, 
“Checkstop State (MSR[ME] = 0).’ Assertion terminates the current 
transaction; that is, assertion of TA and DRTRY are ignored. The 
assertion of TEA causes the negation/high impedance of DBB in the 
next clock cycle. However, data entering the GPR or the cache are 
not invalidated. (Note that the term ‘exception’ is also referred to as 
‘interrupt’ in the architecture specification. ) 














Negated—Indicates that no bus error was detected. 


Assertion—May be asserted while DBB is asserted, and the cycle 
after TA during a read operation. TEA should be asserted for one 
cycle only. 


Negation—TEA must be negated no later than the negation of DBB. 
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7.2.9 System Status Signals 


Most system status signals are input signals that indicate when exceptions are received, 
when checkstop conditions have occurred, and when the MPC750 must be reset. The 
MPC750 generates the output signal, CKSTP_OUT, when it detects a checkstop condition. 
For a detailed description of these signals, see Section 8.7, “Interrupt, Checkstop, and Reset 
Signal Operation.” 


7.2.9.1 Interrupt (INT)—Input 


Following are the state meaning and timing comments for the INT signal. 

State Meaning Asserted—The MPC750 initiates an interrupt if MSR[EE] is set; 
otherwise, the MPC750 ignores the interrupt. To guarantee that the 
MPC750 will take the external interrupt, INT must be held active 
until the MPC750 takes the interrupt; otherwise, whether the 
MPC750 takes an external interrupt depends on whether the 
MSR[EE] bit was set while the INT signal was held active. 


Negated—Indicates that normal operation should proceed. See 
Section 8.7.1, “External Interrupts.” 


Timing Comments Assertion—May occur at any time and may be asserted 
asynchronously to the input clocks. The INT input is level-sensitive. 
Negation—Should not occur until interrupt is taken. 


7.2.9.2 System Management Interrupt (SMI)—Input 


Following are the state meaning and timing comments for SMI. 


State Meaning Asserted—The MPC750 initiates a system management interrupt 
operation if the MSR[EE] is set; otherwise, the MPC750 ignores the 
exception condition. The system must hold SMI active until the 
exception is taken. 


Negated—lIndicates that normal operation should proceed. See 
Section 8.7.1, “External Interrupts.” 


Timing Comments Assertion—May occur at any time and may be asserted 
asynchronously to the input clocks. The SMI input is level-sensitive. 


Negation—Should not occur until interrupt is taken. 


7.2.9.3. Machine Check Interrupt (MCP)—Input 


Following are the state meaning and timing comments for the MCP signal. 


State Meaning Asserted—The MPC750 initiates a machine check interrupt 
operation if MSR[ME] and HIDO[EMCP] are set; if MSR[ME] is 
cleared and HIDO[EMCP] is set, the MPC750 must terminate 
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operation by internally gating off all clocks, and releasing all outputs 
(except CKSTP_OUT) to the high-impedance state. If HIDO[EMCP] 
is cleared, the MPC750 ignores the interrupt condition. The MCP 
signal must be held asserted for two bus clock cycles. 


Negated—Indicates that normal operation should proceed. See 
Section 8.7.1, “External Interrupts.” 


Assertion—May occur at any time and may be asserted 
asynchronously to the input clocks. The MCP input is negative 
edge-sensitive. 





Negation—May be negated two bus cycles after assertion. 


7.2.9.4 Checkstop Input (CKSTP_IN)—Input 


Following are the state meaning and timing comments for the CKSTP_IN signal. 


State Meaning 


Timing Comments 


Asserted—Indicates that the MPC750 must terminate operation by 
internally gating off all clocks, and release all outputs (except 
CKSTP_OUT) to the high-impedance state. Once CKSTP_IN has 
been asserted it must remain asserted until the system has been reset. 


Negated—Indicates that normal operation should proceed. See 
Section 8.7.2, “Checkstops.” 


Assertion—May occur at any time and may be asserted 
asynchronously to the input clocks. 


Negation—May occur any time after the CKSTP_OUT output signal 
has been asserted. 


7.2.9.5 Checkstop Output (CKSTP_OUT)—Output 
Note that the CKSTP_OUT signal is an open-drain type output, and requires an external 


pull-up resistor (for 


example, 10 kQ to Vgq) to assure proper de-assertion of the 


CKSTP_OUT signal. Following are the state meaning and timing comments for the 


CKSTP_OUT signal. 
State Meaning 


Timing Comments 


Asserted—Indicates that the MPC750 has detected a checkstop 
condition and has ceased operation. 


Negated—Indicates that the MPC750 is operating normally. 
See Section 8.7.2, “Checkstops.” 


Assertion—May occur at any time and may be asserted 
asynchronously to the MPC750 input clocks. 


Negation—Is negated upon assertion of HRESET. 
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7.2.9.6 Reset Signals 


There are two reset signals on the MPC750—hard reset (HRESET) and soft reset 
(SRESET). Descriptions of the reset signals are as follows: 


7.2.9.6.1 Hard Reset (HRESET)—Input 


The hard reset (HRESET) signal must be used at power-on in conjunction with the TRST 
signal to properly reset the processor. Following are the state meaning and timing 
comments for the HRESET signal. 


State Meaning Asserted—Initiates a complete hard reset operation when this input 
transitions from asserted to negated. Causes a reset exception as 
described in Section 4.5.1, “System Reset Exception (0x00100).” 
Output drivers are released to high impedance within five clocks 
after the assertion of HRESET. 


Negated—Indicates that normal operation should proceed. See 
Section 8.7.3, “Reset Inputs.” 


Timing Comments Assertion—May occur at any time and may be asserted 
asynchronously to the MPC750 input clock; must be held asserted 
for a minimum of 255 clock cycles after the PLL lock time has been 
met. Refer to the MPC750 hardware specifications for further timing 
comments. 


Negation—May occur any time after the minimum reset pulse width 
has been met. 


This input has additional functionality in certain test modes. 


7.2.9.6.2 Soft Reset (SGRESET)—Input 

Following are the state meaning and timing comments for the SRESET signal. 

State Meaning Asserted— Does not initialize internal resources (different from 
HRESET assertion). However, initiates processing for a reset 


exception as described in Section 4.5.1, “System Reset Exception 
(0x00100),” (same as HRESET). 


Negated—Indicates that normal operation should proceed. See 
Section 8.7.3, “Reset Inputs.” 


Timing Comments Assertion—May occur at any time and may be asserted 
asynchronously to the MPC750 input clock. The SRESET input is 
negative-edge sensitive. 


Negation—May be negated two bus cycles after assertion. 


This input has additional functionality in certain test modes. 
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7.2.9.7 Processor Status Signals 


Processor status signals indicate the state of the processor. This includes the memory 
reservation signal, machine quiesce control signals, time base enable signal, and 
TLBISYNC signal. 


7.2.9.7.1_ Quiescent Request (QREQ)—Output 


Following are the state meaning and timing comments for QREQ. 


State Meaning Asserted—Indicates that the MPC750 is requesting all bus activity 
normally required to be snooped to terminate or to pause so the 
MPC750 may enter a quiescent (low power) state. When the 
MPC750 has entered a quiescent state, it no longer snoops bus 
activity. 
Negated—lIndicates that the MPC750 is not making a request to 
enter the quiescent state. 


Timing Comments Assertion/Negation—May occur on any cycle. QREQ will remain 
asserted for the duration of the quiescent state. 


7.2.9.7.2 Quiescent Acknowledge (QACK)—Input 
Following are the state meaning and timing comments for the QACK signal. 


State Meaning Asserted—Indicates that all bus activity that requires snooping has 
terminated or paused, and that the MPC750 may enter the quiescent 
(or low power) state. 


Negated—Indicates that the MPC750 may not enter a quiescent 
state, and must continue snooping the bus. 


Timing Comments Assertion/Negation—May occur on any cycle following the 
assertion of QREQ, and must be held asserted for at least one bus 
clock cycle. 





7.2.9.7.3. Reservation (RSRV)—Output 


Following are the state meaning and timing comments for RSRV. 


State Meaning Asserted/Negated—Represents the state of the reservation 
coherency bit in the reservation address register that is used by the 
Iwarx and stwex. instructions. See Section 8.8.1, “Support for the 
lwarx/stwcx. Instruction Pair.” 


Timing Comments Assertion/Negation—Occurs synchronously with respect to bus 
clock cycles. The execution of an lwarx instruction sets the internal 
reservation condition. 
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7.2.9.7.4 Time Base Enable (TBEN)—Input 


Following are the state meaning and timing comments for the TBEN signal. 


State Meaning Asserted—Indicates that the time base should continue clocking. 
This input is essentially a count enable control for the time base 
counter. 


Negated—Indicates the time base should stop clocking. 


Timing Comments Assertion/Negation—May occur on any cycle. 


7.2.9.7.5 TLBI Sync (TLBISYNC)—Input 


The TLBI Sync (TLBISYNC) signal is an input-only signal on the MPC750. Following are 
the state meaning and timing comments for the TLBISYNC signal. 


State Meaning Asserted—Indicates that instruction execution should stop after 
execution of a tlbsync instruction. 


Negated—Indicates that the instruction execution may continue or 
resume after the completion of a tlbsync instruction. 


Timing Comments Assertion/Negation—May occur on any cycle. The TLBISYNC 
signal must be held negated during HRESET. 


7.2.9.7.6 L2 Cache Interface 


The MPC750’s dedicated L2 cache interface provides all the signals required for the 
support of up to 1 Mbyte of synchronous SRAM for data storage. The use of the L2 data 
parity (L2DP[0—7]) and L2 low-power mode enable (L2ZZ) signals is optional, and 
depends on the SRAMs selected for use with the MPC750. Note that the least-significant 
bit of L2 address (L2ADDR[16—0]) signals is identified as bit 0, and the most-significant 
bit is identified as bit 16. 


Note that the L2 cache interface is not implemented in the MPC740. 


7.2.9.8 L2 Address (L2ZADDR[16—0])—Output 


Following are the state meaning and timing comments for the L2 address output signals. 


State Meaning Asserted/Negated—Represents the address of the data to be 
transferred to the L2 cache. The L2 address bus is configured with 
bit 0 as the least-significant bit. Address bit 14 determines which 
cache tag set is selected. 


Timing Comments Assertion/Negation—Driven valid by the MPC750 during read and 
write operations; driven with static data when the L2 cache memory 
is not being accessed. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Signal Descriptions 


7.2.9.9 L2Data (L2DATA[0-63]) 


The data bus (L2ZDATA[0-—63]) consists of 64 signals that are both input and output on the 
MPC750. 


7.2.9.9.1_ L2 Data (L2DATA[0-63])—Output 


Following are the state meaning and timing comments for the L2 data output signals. 


State Meaning Asserted/Negated—Represents the state of data during a data write 
transaction; data is always transferred as double words. 


Timing Comments Assertion/Negation—Driven valid by MPC750 during write 
operations; driven with static data when the L2 cache memory is not 
being accessed by a read operation. 


High Impedance—Occurs for at least one cycle when changing 
between read and write operations to the L2 cache memory. 


7.2.9.9.2 L2 Data (L2DATA[0-63])—Input 


Following are the state meaning and timing comments for the L2 data input signals. 


State Meaning Asserted/Negated—Represents the state of data during a data read 
transaction; data is always transferred as double words. 


Timing Comments Assertion/Negation—Driven valid by L2 cache memory during read 
operations. 


7.2.9.10 L2 Data Parity (L2DP[0—7]) 


The eight data bus parity (L2DP[0-7]) signals on the MPC750 are both output and input 
signals. 


7.2.9.10.1 L2 Data Parity (L2DP[0—7])—Output 
Following are the state meaning and timing comments for the L2 data parity output signals. 


State Meaning Asserted/Negated—Represents odd parity for each of the 8 bytes of 
L2 cache data during write transactions. Odd parity means that an 
odd number of bits, including the parity bit, are driven high. Note 
that parity bit 0 is associated with bits 0-7 (byte lane 0) of the 
L2DATA bus. 

Timing Comments _ Assertion/Negation—The same as L2DATA[0-63]. 

High Impedance—The same as L2ZDATA[0-63]. 


7.2.9.10.2 L2 Data Parity (L2DP[0—7])—Input 


Following are the state meaning and timing comments for the L2 parity input signals. 
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State Meaning Asserted/Negated—Represents odd parity for each byte of L2 cache 
read data. 


Timing Comments Assertion/Negation—The same as L2ZDATA[0-63]. 


7.2.9.11 L2 Chip Enable (L2CE)—Output 


Following are the state meaning and timing comments for the L2CE signal. 
State Meaning Asserted—Indicates that the L2 cache memory devices are being 
selected for a read or write operation. 


Negated—Indicates that the MPC750 is not selecting the L2 cache 
memory devices for a read or write operation. 


Timing Comments _ Assertion/Negation—May occur on any cycle. L2CE is driven high 
during HRESET assertion. 


7.2.9.12 L2 Write Enable (L2WE)—Output 

Following are the state meaning and timing comments for the L2WE signal. 

State Meaning Asserted—Indicates that the MPC750 is performing a write 
operation to the L2 cache memory. 
Negated—Indicates that the MPC750 is not performing an L2 cache 
memory write operation. 


Timing Comments Assertion/Negation—May occur on any cycle. L2WE is driven high 
during HRESET assertion. 


7.2.9.13 L2 Clock Out A (L2CLK_OUTA)—Output 


Following are the state meaning and timing comments for the LZCLK_OUTA signal. 

State Meaning Asserted/Negated—Clock output for L2 cache memory devices. The 
L2CLK_OUTA signal is identical and synchronous with the 
L2CLK_OUTB signal, and provides the capability to drive up to four 
L2 cache memory devices. If differential L2 clocking is configured 
through the setting of the L2CR, the L2CLK_OUTB signal is driven 
phase inverted with relation to the LZCLK_OUTA signal. 

Timing Comments Assertion/Negation—Refer to the MPC750 hardware specifications 
for timing comments. The L2CLK_OUTA signal is driven low 
during assertion of HRESET. 


7.2.9.14 L2 Clock Out B (L2CLK_OUTB)—Output 
Following are the state meaning and timing comments for the LZCLK_OUTB signal. 


State Meaning Asserted/Negated—Clock output for L2 cache memory devices. The 
L2CLK_OUTB signal is identical and synchronous with the 
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L2CLK_OUTA signal, and provides the capability to drive up to four 
L2 cache memory devices. If differential L2 clocking is configured 
through the setting of the L2CR, the L2CLK_OUTA signal is driven 
phase inverted with relation to the L2CLK_OUTB signal. 


Assertion/Negation—Refer to the MPC750 hardware specifications 
for timing comments. The LZCLK_OUTB signal is driven low 
during assertion of HRESET. 


7.2.9.15 L2 Sync Out (L2SYNC_OUT)—Output 
Following are the state meaning and timing comments for the L2SYNC_OUT signal. 


State Meaning 


Timing Comments 


Asserted/Negated—Clock output for L2 clock synchronization. The 
L2SYNC_OUT signal should be routed half of the trace length to the 
L2 cache memory devices and returned to the L2SYNC_IN signal 
input. 

Assertion/Negation—Refer to the MPC750 hardware specifications 
for timing comments. The L2SYNC_OUT signal is driven low 
during assertion of HRESET. 


7.2.9.16 L2 Sync In (L2SYNC_IN)—Input 


Following are the state meaning and timing comments for the L2SYNC_IN signal. 


State Meaning 


Timing Comments 


Asserted/Negated—Clock input for L2 clock synchronization. The 
L2SYNC_IN signal is driven by the L2SYNC_OUT signal output. 


Assertion/Negation—Refer to the MPC750 hardware specifications 
for timing comments. The routing of this signal on the printed circuit 
board should ensure that the rising edge at L2S YNC_IN is 
coincident with the rising edge of the clock at the clock input of the 
L2 cache memory devices. 


7.2.9.17 L2 Low-Power Mode Enable (L2ZZ)—Output 


Following are the state meaning and timing comments for the L2ZZ signal. 


State Meaning 


Timing Comments 


Asserted/Negated—Enables low-power mode for certain L2 cache 
memory devices. Operation of the signal is enabled through the 
L2CR. 


Assertion/Negation—Occurs synchronously with the L2 clock when 
the MPC750 enters and exits the nap or sleep power modes; after 
negation of this signal, at least two L2 clock cycles will elapse before 
L2 cache operations resume. The L2ZZ signal is driven low during 
assertion of HRESET. 
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7.2.10 IEEE 1149.1a-1993 Interface Description 


The MPC750 has five dedicated JTAG signals which are described in Table 7-6. The test 
data input (TDI) and test data output (TDO) scan ports are used to scan instructions as well 
as data into the various scan registers for JTAG operations. The scan operation is controlled 
by the test access port (TAP) controller which in turn is controlled by the test mode select 
(TMS) input sequence. The scan data is latched in at the rising edge of test clock (TCK). 


Table 7-6. IEEE Interface Pin Descriptions 




















Signal Name | Input/Output iitcching IEEE 1149.1a Function 
TDI Input Yes Serial scan input signal 
TDO Output No Serial scan output signal 
TMS Input Yes TAP controller mode signal 
TCK Input Yes Scan clock 
TRST Input Yes TAP controller reset 

















Test reset (TRST) is a JTAG optional signal which is used to reset the TAP controller 
asynchronously. The TRST signal assures that the JTAG logic does not interfere with the 
normal operation of the chip, and must be asserted and deasserted coincident with the 
assertion of the HRESET signal. 


7.2.11 Clock Signals 


The MPC750 clock signal inputs determine the system clock frequency and provide a 
flexible clocking scheme that allows the processor to operate at an integer multiple of the 
system clock frequency. 


Refer to the MPC750 hardware specifications for exact timing relationships of the clock 
signals. 


7.2.11.1 System Clock (SYSCLK)—Input 


The MPC750 requires a single system clock (SYSCLK) input. This input sets the frequency 
of operation for the bus interface. Internally, the MPC750 uses a phase-locked loop (PLL) 
circuit to generate a master clock for all of the CPU circuitry (including the bus interface 
circuitry) which is phase-locked to the SYSCLK input. The master clock may be set to an 
integer or half-integer multiple (2:1, 2.5:1, 3:1, 3.5:1, 4:1, 4.5:1, 5:1, 5.5:1, 6:1, 6.5:1, or 
7:1) of the SYSCLK frequency allowing the CPU core to operate at an equal or greater 
frequency than the bus interface. 


State Meaning Asserted/Negated—The SYSCLK input is the primary clock input 
for the MPC750, and represents the bus clock frequency for 
MPC750 bus operation. Internally, the MPC750 may be operating at 
an integer or half-integer multiple of the bus clock frequency. 
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Timing Comments Duty cycle—Refer to the MPC750 hardware specifications for 
timing comments. 
Note: SYSCLK is used as the frequency reference for the internal 
PLL clock generator, and must not be suspended or varied during 
normal operation to ensure proper PLL operation. 


7.2.11.2 Clock Out (CLK_OUT)—Output 


The clock out (CLK_OUT) signal is an output signal (output-only) on the MPC750. 
Following are the state meaning and timing comments for the CLK_OUT signal. 


State Meaning Asserted/Negated—Provides PLL clock output for PLL testing and 
monitoring. The configuration of the HIDO[SBCLK] and 
HIDO[ECLK] bits determines whether the CLK_OUT signal clocks 
at either the processor clock frequency, the bus clock frequency, or 
half of the bus clock frequency. See Table 2-5 for HIDO register 
configuration of the CLK_OUT signal. The CLK_OUT signal 
defaults to a high-impedance state following the assertion of 
HRESET. The CLK_OUT signal is provided for testing only. 


Timing Comments _ Assertion/Negation—Refer to the MPC750 hardware specifications 
for timing comments. 


7.2.11.3 PLL Configuration (PLL_CFG[0—3])—Input 


The PLL (phase-locked loop) is configured by the PLL_CFG[0-3] signals. For a given 
SYSCLK (bus) frequency, the PLL configuration signals set the internal CPU frequency of 
operation. Refer to the MPC750 hardware specifications for PLL configuration. 


Following are the state meaning and timing comments for the PLL_CFG[0—3] signals. 


State Meaning Asserted/Negated— Configures the operation of the PLL and the 
internal processor clock frequency. Settings are based on the desired 
bus and internal frequency of operation. 


Timing Comments Assertion/Negation—Must remain stable during operation; should 
only be changed during the assertion of HRESET or during sleep 
mode. These bits may be read through the PC[0-—3] bits in the HID1 
register. 


7.2.12 Power and Ground Signals 


The MPC750 provides the following connections for power and ground: 
* Vpp—The Vpp signals provide the supply voltage connection for the processor 
core. 


© OVpp—The OVpp signals provide the supply voltage connection for the system 
interface drivers. 


Chapter 7. Signal Descriptions 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 


Signal Descriptions 


L2Vpp—The L2Vpp signals provide the supply voltage connection for the L2 
cache interface drivers. These power supply signals are isolated from the Vpp and 
OVpp power supply signals. These signals are not implemented on the MPC740. 


AVpp—Tlhe AVpp power signal provides power to the clock generation 
phase-locked loop. See the MPC750 hardware specifications for information on 
how to use this signal. 

L2AVpp—The L2AVpp power signal provides power to the L2 delay-locked loop. 
See the MPC750 hardware specifications for information on how to use this signal. 
This signal is not implemented on the MPC740. 

GND and OGND—The GND and OGND signals provide the connection for 
grounding the MPC750. On the MPC750, there is no electrical distinction between 
the GND and OGND signals. 

L2GND—The L2GND signals provide the ground connection for the L2 cache 
interface. These ground signals are isolated from the GND and OGND ground 
signals. These signals are not implemented on the MPC740. 
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System Interface Operation 


This chapter describes the MPC750 microprocessor bus interface and its operation. It 
shows how the MPC750 signals, defined in Chapter 7, “Signal Descriptions,” interact to 
perform address and data transfers. Note that the MPC755 microprocessor is a derivative 
of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted 
in Appendix C, “MPC755 Embedded G3 Microprocessor.” 


8.1. MPC750 System Interface Overview 


The system interface prioritizes requests for bus operations from the instruction and data 
caches, and performs bus operations in accordance with the 60x bus protocol. It includes 
address register queues, prioritization logic, and a bus control unit. The system interface 
latches snoop addresses for snooping in the data cache and in the address register queues, 
and for reservations controlled by the Load Word and Reserve Indexed (Iwarx) and Store 
Word Conditional Indexed (stwex.) instructions, and maintains the touch load address for 
the cache. The interface allows one level of pipelining; that is, with certain restrictions 
discussed later, there can be two outstanding transactions at any given time. Accesses are 
prioritized with load operations preceding store operations. 


Instructions are automatically fetched from the memory system into the instruction unit 
where they are dispatched to the execution units at a peak rate of two instructions per clock. 
Conversely, load and store instructions explicitly specify the movement of operands to and 
from the integer and floating-point register files and the memory system. 


When the MPC750 encounters an instruction or data access, it calculates the logical address 
(effective address in the architecture specification) and uses the low-order address bits to 
check for a hit in the on-chip, 32-Kbyte instruction and data caches. During cache lookup, 
the instruction and data memory management units (MMUs) use the higher-order address 
bits to calculate the virtual address, from which they calculate the physical address (real 
address in the architecture specification). The physical address bits are then compared with 
the corresponding cache tag bits to determine if a cache hit occurred in the L1 instruction 
or data cache. If the access misses in the corresponding cache, the physical address is used 
to access the L2 cache tags (if the L2 cache is enabled). If no match is found in the L2 cache 
tags, the physical address is used to access system memory. 


In addition to the loads, stores, and instruction fetches, the MPC750 performs hardware 
table search operations following TLB misses, L2 cache cast-out operations when 
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least-recently used cache lines are written to memory after a cache miss, and cache-line 
snoop push-out operations when a modified cache line experiences a snoop hit from another 
bus master. 


Figure 8-1 shows the address path from the execution units and instruction fetcher, through 
the translation logic to the caches and system interface logic. 


The MPC750 uses separate address and data buses and a variety of control and status 
signals for performing reads and writes. The address bus is 32 bits wide and the data bus is 
64 bits wide. The interface is synchronous—all MPC750 inputs are sampled at and all 
outputs are driven from the rising edge of the bus clock. The processor runs at a multiple of 
the bus-clock speed. The MPC750 core operates at 2.5 volts, and the I/O signals operate at 
3.3 volts. 


8.1.1. Operation of the Instruction and Data L1 Caches 


The MPC750 provides independent instruction and data L1 caches. Each cache is a 
physically-addressed, 32-Kbyte cache with eight-way set associativity. Both caches consist 
of 128 sets of eight cache lines, with eight words in each cache line. 


Because the data cache on the MPC750 is an on-chip, write-back primary cache, the 
predominant type of transaction for most applications is burst-read memory operations, 
followed by _ burst-write memory operations and _ single-beat (noncacheable or 
write-through) memory read and write operations. Additionally, there can be address-only 
operations, variants of the burst and single-beat operations (global memory operations that 
are snooped, and atomic memory operations, for example), and address retry activity (for 
example, when a snooped read access hits a modified line in the cache). 


Since the MPC750 data cache tags are single ported, simultaneous load or store and snoop 
accesses cause resource contention. Snoop accesses have the highest priority and are given 
first access to the tags, unless the snoop access coincides with a tag write, in which case the 
snoop is retried and must re-arbitrate for access to the cache. Loads or stores that are 
deferred due to snoop accesses are performed on the clock cycle following the snoop. 


The MPC750 supports a three-state coherency protocol that supports the modified, 
exclusive, and invalid (MEI) cache states. The protocol is a subset of the MESI 
(modified/exclusive/shared/invalid) four-state protocol and operates coherently in systems 
that contain four-state caches. With the exception of the dcbz instruction (and the debi, 
debst, and debf instructions, if HIDO[ ABE] is enabled), the MPC750 does not broadcast 
cache control instructions. The cache control instructions are intended for the management 
of the local cache but not for other caches in the system. 


Cache lines in the MPC750 are loaded in four beats of 64 bits each. The burst load is 
performed as critical double word first. The critical double word is simultaneously written 
to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. 
If subsequent loads follow in sequential order, the instructions or data will be forwarded to 
the requesting unit as the cache block is written. 
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Cache lines are selected for replacement based on a pseudo least-recently-used (PLRU) 
algorithm. Each time a cache line is accessed, it is tagged as the most-recently-used line of 
the set. When a miss occurs, and all eight lines in the set are marked as valid, the least 
recently used line is replaced with the new data. When data to be replaced is in the modified 
state, the modified data is written into a write-back buffer while the missed data is being 
read from memory. When the load completes, the MPC750 then pushes the replaced line 
from the write-back buffer to the L2 cache (if enabled), or to main memory in a burst write 
operation. 


8.1.2 Operation of the L2 Cache 


The MPC750 provides an on-chip, two-way set associative tag memory, and a dedicated L2 
cache port with support for up to 1 Mbyte of external synchronous SRAMs for data storage. 
The L2 cache normally operates in copy-back mode and supports system cache coherency 
through snooping. Designers should note that the MPC740 does not implement the on-chip 
L2 tag memory, or the signals required for the support of the external SRAMs, and memory 
accesses go directly to the bus interface unit. 


The L2 cache receives independent memory access requests from both the L1 instruction 
and data caches. The L1 accesses are compared to the L2 cache tags and the data or 
instructions are forwarded from the L2 to the L1 cache if there is a cache hit, or are 
forwarded on to the bus interface unit if there is an L2 cache miss, or if the address being 
accessed is from a page marked as caching-inhibited. Burst read accesses that miss in the 
L2 cache initiate a load operation from the bus interface. As the load operation transfers 
data to the L1 cache, the data is also loaded into the L2 cache, and marked as valid 
unmodified in the L2 cache tags. An L1 load, store, or castout operation can cause an L2 
cache block allocation resulting in the castout of an L2 cache block marked modified to the 
bus interface. For additional information about the operation of the L2 cache, refer to 
Chapter 9, “L2 Cache Interface Operation.” 


8.1.3 Operation of the System Interface 


Memory accesses can occur in single-beat (1, 2, 3, 4, and 8 bytes) and four-beat (32 bytes) 
burst data transfers. The address and data buses are independent for memory accesses to 
support pipelining and split transactions. The MPC750 can pipeline as many as two 
transactions and has limited support for out-of-order split-bus transactions. 


Access to the system interface is granted through an external arbitration mechanism that 
allows devices to compete for bus mastership. This arbitration mechanism is flexible, 
allowing the MPC750 to be integrated into systems that implement various fairness and 
bus-parking procedures to avoid arbitration overhead. 


Typically, memory accesses are weakly ordered to maximize the efficiency of the bus 
without sacrificing coherency of the data. The MPC750 allows load operations to bypass 
store operations (except when a dependency exists). In addition, the MPC750 can be 
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configured to reorder high-priority store operations ahead of lower-priority store 
operations. Because the processor can dynamically optimize run-time ordering of 
load/store traffic, overall performance is improved. 


Note that the synchronize (sync) and enforce in-order execution of IO (eieio) instructions 
can be used to enforce strong ordering. 


The following sections describe how the MPC750 interface operates, providing detailed 
timing diagrams that illustrate how the signals interact. A collection of more general timing 
diagrams are included as examples of typical bus operations. 


Figure 8-2 is a legend of the conventions used in the timing diagrams. 
Bar over signal name indicates active low 


apO MPC750 input (while MPC750 is a bus master) 





BR MPC750 output (while MPC750 is a bus master) 
ADDR+ MPC750 output (grouped: here, address plus attributes) 


qual BG MPC750 internal signal (inaccessible to the user, but 
used in diagrams to clarify operations) 


Compelling dependency—event will occur on the 
next clock cycle 


Prerequisite dependency—event will occur on an 
undetermined subsequent clock cycle 


MPC750 three-state output or input 


ez 
Se MPC750 nonsampled input 
a 


Signal with sample point 


A sampled condition (dot on high or low state) 
oe with multiple dependencies 


\ /__ Timing for a signal had it been asserted (it is not 
ine actually asserted) 


Figure 8-2. Timing Diagram Legend 
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This is a synchronous interface—all MPC750 input signals are sampled and output signals 
are driven on the rising edge of the bus clock cycle (see the MPC750 hardware 
specifications for exact timing information). 


8.1.4 Direct-Store Accesses 


The MPC750 does not support the extended transfer protocol for accesses to the 
direct-store storage space. The transfer protocol used for any given access is selected by the 
T bit in the MMU segment registers; if the T bit is set, the memory access is a direct-store 
access. An attempt to access instructions or data in a direct-store segment will result in the 
MPC750 taking an ISI or DSI exception. 


8.2 Memory Access Protocol 


Memory accesses are divided into address and data tenures. Each tenure has three 
phases—bus arbitration, transfer, and termination. The MPC750 also supports address-only 
transactions. Note that address and data tenures can overlap, as shown in Figure 8-3. 


Figure 8-3 shows that the address and data tenures are distinct from one another and that 
both consist of three phases—arbitration, transfer, and termination. Address and data 
tenures are independent (indicated in Figure 8-3 by the fact that the data tenure begins 
before the address tenure ends), which allows split-bus transactions to be implemented at 
the system level in multiprocessor systems. Figure 8-3 shows a data transfer that consists 
of a single-beat transfer of as many as 64 bits. Four-beat burst transfers of 32-byte cache 
lines require data transfer termination signals for each beat of data. 


ADDRESS TENURE 


ARBITRATION | TRANSFER | TERMINATION 





INDEPENDENT ADDRESS AND DATA 


DATA TENURE 


ARBITRATION | SINGLE-BEAT TRANSFER | TERMINATION 


Figure 8-3. Overlapping Tenures on the MPC750 Bus for a Single-Beat Transfer 


The basic functions of the address and data tenures are as follows: 
e Address tenure 


— Arbitration: During arbitration, address bus arbitration signals are used to gain 
mastership of the address bus. 
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— Transfer: After the MPC750 is the address bus master, it transfers the address on 
the address bus. The address signals and the transfer attribute signals control the 
address transfer. The address parity and address parity error signals ensure the 
integrity of the address transfer. 


— Termination: After the address transfer, the system signals that the address tenure 
is complete or that it must be repeated. 


¢ Data tenure 


— Arbitration: To begin the data tenure, the MPC750 arbitrates for mastership of 
the data bus. 


— Transfer: After the MPC750 is the data bus master, it samples the data bus for 
read operations or drives the data bus for write operations. The data parity and 
data parity error signals ensure the integrity of the data transfer. 


— Termination: Data termination signals are required after each data beat in a data 
transfer. Note that in a single-beat transaction, the data termination signals also 
indicate the end of the tenure, while in burst accesses, the data termination 
signals apply to individual beats and indicate the end of the tenure only after the 
final data beat. 


The MPC750 generates an address-only bus transfer during the execution of the dcbz 
instruction (and for the debi, dcbf, dcbst, sync, and eieio instructions, if HIDO[ABE] is 
enabled), which uses only the address bus with no data transfer involved. Additionally, the 
MPC750’s retry capability provides an efficient snooping protocol for systems with 
multiple memory systems (including caches) that must remain coherent. 


8.2.1. Arbitration Signals 


Arbitration for both address and data bus mastership is performed by a central, external 
arbiter and, minimally, by the arbitration signals shown in Section 7.2.1, “Address Bus 
Arbitration Signals.” Most arbiter implementations require additional signals to coordinate 
bus master/slave/snooping activities. Note that address bus busy (ABB) and data bus busy 
(DBB) are bidirectional signals. These signals are inputs unless the MPC750 has 
mastership of one or both of the respective buses; they must be connected high through 
pull-up resistors so that they remain negated when no devices have control of the buses. 





The following list describes the address arbitration signals: 


¢ BR (bus request)—Assertion indicates that the MPC750 is requesting mastership 
of the address bus. 


¢ BG (bus grant)—Assertion indicates that the MPC750 may, with the proper 
qualification, assume mastership of the address bus. A qualified bus grant occurs 
when BG is asserted and ABB and ARTRY are negated. 


If the MPC750 is parked, BR need not be asserted for the qualified bus grant. 
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¢ ABB (address bus busy)— Assertion by the MPC750 indicates that the MPC750 is 
the address bus master. 


The following list describes the data arbitration signals: 


¢ DBG (data bus grant)—Indicates that the MPC750 may, with the proper 
qualification, assume mastership of the data bus. A qualified data bus grant occurs 
when DBG is asserted while DBB, DRTRY, and ARTRY are negated. 


The DBB signal is driven by the current bus master, DRTRY is only driven from the 
bus, and ARTRY is from the bus, but only for the address bus tenure associated with 
the current data bus tenure (that is, not from another address tenure). 


¢ DBWO (data bus write only)—Assertion indicates that the MPC750 may perform 
the data bus tenure for an outstanding write address even if a read address is 
pipelined before the write address. If DBWO is asserted, the MPC750 will assume 
data bus mastership for a pending data bus write operation; the MPC750 will take 
the data bus for a pending read operation if this input is asserted along with DBG 
and no write is pending. Care must be taken with DBWO to ensure the desired write 
is queued (for example, a cache-line snoop push-out operation). 


¢ DBB (data bus busy)—Assertion by the MPC750 indicates that the MPC750 is the 
data bus master. The MPC750 always assumes data bus mastership if it needs the 
data bus and is given a qualified data bus grant (see DBG). 


























For more detailed information on the arbitration signals, refer to Section 7.2.1, 
“Address Bus Arbitration Signals,” and Section 7.2.6, “Data Bus Arbitration 
Signals.” 


8.2.2 Address Pipelining and Split-Bus Transactions 


The MPC750 protocol provides independent address and data bus capability to support 
pipelined and split-bus transaction system organizations. Address pipelining allows the 
address tenure of a new bus transaction to begin before the data tenure of the current 
transaction has finished. Split-bus transaction capability allows other bus activity to occur 
(either from the same master or from different masters) between the address and data 
tenures of a transaction. 


While this capability does not inherently reduce memory latency, support for address 
pipelining and split-bus transactions can greatly improve effective bus/memory throughput. 
For this reason, these techniques are most effective in shared-memory multimaster 
implementations where bus bandwidth is an important measurement of system 
performance. 


External arbitration is required in systems in which multiple devices must compete for the 
system bus. The design of the external arbiter affects pipelining by regulating address bus 
grant (BG), data bus grant (DBG), and address acknowledge (AACK) signals. For example, 
a one-level pipeline is enabled by asserting AACK to the current address bus master and 











MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Address Bus Tenure 


granting mastership of the address bus to the next requesting master before the current data 
bus tenure has completed. Two address tenures can occur before the current data bus tenure 
completes. 


The MPC750 can pipeline its own transactions to a depth of one level (intraprocessor 
pipelining); however, the MPC750 bus protocol does not constrain the maximum number 
of levels of pipelining that can occur on the bus between multiple masters (interprocessor 
pipelining). The external arbiter must control the pipeline depth and synchronization 
between masters and slaves. 


In a pipelined implementation, data bus tenures are kept in strict order with respect to 
address tenures. However, external hardware can further decouple the address and data 
buses, allowing the data tenures to occur out of order with respect to the address tenures. 
This requires some form of system tag to associate the out-of-order data transaction with 
the proper originating address transaction (not defined for the MPC750 interface). 
Individual bus requests and data bus grants from each processor can be used by the system 
to implement tags to support interprocessor, out-of-order transactions. 


The MPC750 supports a limited intraprocessor out-of-order, split-transaction capability via 
the data bus write only (DBWO) signal. For more information about using DBWO, see 
Section 8.10, “Using Data Bus Write Only.” 








8.3 Address Bus Tenure 


This section describes the three phases of the address tenure—address bus arbitration, 
address transfer, and address termination. 


8.3.1. Address Bus Arbitration 


When the MPC750 needs access to the external bus and it is not parked (BG is negated), it 
asserts bus request (BR) until it is granted mastership of the bus and the bus is available (see 
Figure 8-4). The external arbiter must grant master-elect status to the potential master by 
asserting the bus grant (BG) signal. The MPC750 requesting the bus determines that the bus 
is available when the ABB input is negated. When the address bus is not busy (ABB input 
is negated), BG is asserted and the address retry (ARTRY) input is negated. This is referred 
to as a qualified bus grant. The potential master assumes address bus mastership by 
asserting ABB when it receives a qualified bus grant. 
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Figure 8-4. Address Bus Arbitration 


External arbiters must allow only one device at a time to be the address bus master. 
Implementations in which no other device can be a master, BG can be grounded (always 
asserted) to continually grant mastership of the address bus to the MPC750. 


If the MPC750 asserts BR before the external arbiter asserts BG, the MPC750 is considered 
to be unparked, as shown in Figure 8-4. Figure 8-5 shows the parked case, where a qualified 
bus grant exists on the clock edge following a need_bus condition. Notice that the bus clock 
cycle required for arbitration is eliminated if the MPC750 is parked, reducing overall 
memory latency for a transaction. The MPC750 always negates ABB for at least one bus 
clock cycle after AACK is asserted, even if it is parked and has another transaction pending. 





Typically, bus parking is provided to the device that was the most recent bus master; 
however, system designers may choose other schemes such as providing unrequested bus 
grants in situations where it is easy to correctly predict the next device requesting bus 
mastership. 
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Figure 8-5. Address Bus Arbitration Showing Bus Parking 


When the MPC750 receives a qualified bus grant, it assumes address bus mastership by 
asserting ABB and negating the BR output signal. Meanwhile, the MPC750 drives the 
address for the requested access onto the address bus and asserts TS to indicate the start of 
a new transaction. 





When designing external bus arbitration logic, note that the MPC750 may assert BR 
without using the bus after it receives the qualified bus grant. For example, in a system using 
bus snooping, if the MPC750 asserts BR to perform a replacement copy-back operation, 
another device can invalidate that line before the MPC750 is granted mastership of the bus. 
Once the MPC750 is granted the bus, it no longer needs to perform the copy-back 
operation; therefore, the MPC750 does not assert ABB and does not use the bus for the 
copy-back operation. Note that the MPC750 asserts BR for at least one clock cycle in these 
instances. 





System designers should note that it is possible to ignore the ABB signal, and regenerate 
the state of ABB locally within each device by monitoring the TS and AACK input signals. 
The MPC750 allows this operation by using both the ABB input signal and a locally 
regenerated version of ABB to determine if a qualified bus grant state exists (both sources 
are internally ORed together). The ABB signal may only be ignored if ABB and TS are 
asserted simultaneously by all masters, or where arbitration (through assertion of BG) is 
properly managed in cases where the regenerated ABB may not properly track the ABB 
signal on the bus. If the MPC750’s ABB signal is ignored by the system, it must be 
connected to a pull-up resistor to ensure proper operation. Additionally, the MPC750 will 
not qualify a bus grant during the cycle that TS is asserted on the bus by any master. Address 
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bus arbitration without the use of the ABB signal requires that every assertion of TS be 
acknowledged by an assertion of AACK while the processor is not in sleep mode. 





8.3.2 Address Transfer 


During the address transfer, the physical address and all attributes of the transaction are 
transferred from the bus master to the slave device(s). Snooping logic may monitor the 
transfer to enforce cache coherency; see discussion about snooping in Section 8.3.3, 
“Address Transfer Termination.” 


The signals used in the address transfer include the following signal groups: 
¢ Address transfer start signal: transfer start (TS) 
e Address transfer signals: address bus (A[O—31]), and address parity (AP[O-—3]) 


e Address transfer attribute signals: transfer type (TT[O—4]), transfer size 
(TSIZ[0-2]), transfer burst (TBST), cache inhibit (CI), write-through (WT), and 
global (GBL) 








Figure 8-6 shows that the timing for all of these signals, except TS, is identical. All of the 
address transfer and address transfer attribute signals are combined into the ADDR+ 
grouping in Figure 8-6. The TS signal indicates that the MPC750 has begun an address 
transfer and that the address and transfer attributes are valid (within the context of a 
synchronous bus). The MPC750 always asserts TS coincident with ABB. As an input, TS 
need not coincide with the assertion of ABB on the bus (that is, TS can be asserted with, or 
on, a subsequent clock cycle after ABB is asserted; the MPC750 tracks this transaction 
correctly). 











In Figure 8-6, the address transfer occurs during bus clock cycles 1 and 2 (arbitration occurs 
in bus clock cycle 0 and the address transfer is terminated in bus clock 3). In this diagram, 
the address bus termination input, AACK, is asserted to the MPC750 on the bus clock 
following assertion of TS (as shown by the dependency line). This is the minimum duration 
of the address transfer for the MPC750; the duration can be extended by delaying the 
assertion of AACK for one or more bus clocks. 
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Figure 8-6. Address Bus Transfer 


8.3.2.1. Address Bus Parity 


The MPC750 always generates 1 bit of correct odd-byte parity for each of the 4 bytes of 
address when a valid address is on the bus. The calculated values are placed on the AP[0—3] 
outputs when the MPC750 is the address bus master. If the MPC750 is not the master and 
TS and GBL are asserted together (qualified condition for snooping memory operations), 
the calculated values are compared with the AP[0—3] inputs. If there is an error, and address 
parity checking is enabled (HIDO[EBA] set to 1), a machine check exception is generated. 
An address bus parity error causes a checkstop condition if MSR[ME] is cleared to 0. For 
more information about checkstop conditions, see Chapter 4, “Exceptions.” 





8.3.2.2 Address Transfer Attribute Signals 


The transfer attribute signals include several encoded signals such as the transfer type 
(TT[O-4]) signals, transfer burst (TBST) signal, transfer size (TSIZ[O-2]) signals, 
write-through (WT), and cache inhibit (CI). Section 7.2.4, “Address Transfer Attribute 
Signals,” describes the encodings for the address transfer attribute signals. 








8.3.2.2.1. Transfer Type (TT[0-4]) Signals 


Snooping logic should fully decode the transfer type signals if the GBL signal is asserted. 
Slave devices can sometimes use the individual transfer type signals without fully decoding 
the group. For a complete description of the encoding for TT[O—-4], refer to Table 8-1 and 
Table 8-2. 


Chapter 8. System Interface Operation 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Address Bus Tenure 


8.3.2.2.2 Transfer Size (TSIZ[0—2]) Signals 


The TSIZ[0—2] signals indicate the size of the requested data transfer as shown in Table 8-1. 
The TSIZ[0—2] signals may be used along with TBST and A[29-—31] to determine which 
portion of the data bus contains valid data for a write transaction or which portion of the 
bus should contain valid data for a read transaction. Note that for a burst transaction (as 
indicated by the assertion of TBST), TSIZ[0—2] are always set to Ob010. Therefore, if the 
TBST signal is asserted, the memory system should transfer a total of eight words (32 
bytes), regardless of the TSIZ[O—2] encodings. 











Table 8-1. Transfer Size Signal Encodings 

















TBST TSIZO TSIZ1 TSIZ2 Transfer Size 
Asserted 0 1 0 Eight-word burst 
Negated 0 0 0 Eight bytes 
Negated 0 0 1 One byte 
Negated 0 1 0 Two bytes 
Negated 0 1 1 Three bytes 
Negated 1 0 0 Four bytes 
Negated 1 0 1 Five bytes (N/A) 
Negated 1 1 0 Six bytes (N/A) 
Negated 1 1 1 Seven bytes (N/A) 








The basic coherency size of the bus is defined to be 32 bytes (corresponding to one cache 
line). Data transfers that cross an aligned, 32-byte boundary either must present a new 
address onto the bus at that boundary (for coherency consideration) or must operate as 
noncoherent data with respect to the MPC750. The MPC750 never generates a bus 
transaction with a transfer size of 5 bytes, 6 bytes, or 7 bytes. 


For operations generated by the eciwx/ecowx instructions, a transfer size of 4 bytes is 
implied, and the TBST and TSIZ[0:2] signals are redefined to specify the resource ID 
(RID). The RID is copied from bits 28—31 of the external access register (EAR). For these 
operations, the TBST signal carries the EAR[28] data without inversion (active high). 








8.3.2.2.3 Write-Through (WT) Signal 


The MPC750 provides the WT signal to indicate a write-through operation as determined 
by the WIM bit settings during address translation by the MMU. The WT signal is also 
asserted for burst writes due to the execution of the dcbf and debst instructions, and snoop 
push operations. The WT signal is deasserted for accesses caused by the execution of the 
ecowx instruction. During read operations the MPC750 uses the WT signal to indicate 
whether the transaction is an instruction fetch (WT set to 1), or a data read operation (WT 
cleared to 0). 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Address Bus Tenure 


8.3.2.2.4 Cache Inhibit (Cl) Signal 


The MPC750 indicates the caching-inhibited status of a transaction (determined by the 
setting of the WIM bits by the MMU) through the use of the CI signal. The CI signal is 
asserted even if the L1 caches are disabled or locked. This signal is also asserted for bus 
transactions caused by the execution of eciwx and ecowx instructions independent of the 
address translation. 


8.3.2.3. Burst Ordering During Data Transfers 


During burst data transfer operations, 32 bytes of data (one cache line) are transferred to or 
from the cache in order. Burst write transfers are always performed zero double word first, 
but since burst reads are performed critical double word first, a burst read transfer may not 
start with the first double word of the cache line, and the cache line fill may wrap around 
the end of the cache line. 


Table 8-2 describes the data bus burst ordering. 
Table 8-2. Burst Ordering 









































For Starting Address: 
Data Transfer 
A[27-28] = 00 A[27-28] = 01 A[27—-28] = 10 A[27-28] = 11 
First data beat DWO DW1 DW2 DW3 
Second data beat DW1 DW2 DW3 DWO 
Third data beat DW2 DW3 DWO DW1 
Fourth data beat DW3 DWO DW1 DW2 
Note: 


A[29-31] are always 0b000 for burst transfers by the MPC750. 


8.3.2.4 Effect of Alignment in Data Transfers 


Table 8-3 lists the aligned transfers that can occur on the MPC750 bus. These are transfers 
in which the data is aligned to an address that is an integral multiple of the size of the data. 
For example, Table 8-3 shows that 1-byte data is always aligned; however, for a 4-byte 
word to be aligned, it must be oriented on an address that is a multiple of 4. 
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Table 8-3. Aligned Data Transfers 







































































Data Bus Byte Lane(s) 
Transfer Size | TSIZO | TSIZ1 | TSIZ2 | A[29-31] 
Byte 0 0 1 000 V SSjileese Pe, eS || eer i Fe 
0 0 1 001 _ V bg een (ea ee (ee eee 
0 0 1 010 <=, 1), V ee Se |e ee 
0 0 1 011 ae V == jl) 2a 
0 0 1 100 a eee V eS |e pee 
0 0 1 101 ee), a8 ee eh \ =i 
0 0 1 110 ee ee ee ee ee yy = 
0 0 1 111 ass ee |] eek i ei ese lh ee tes J 
Half word 0 1 0 000 V V S|) ee ee lS 
0 1 0 010 ee fe V V ae eee | (ee ee ee 
0 1 0 100 ete lhe ||| ell se V \ eral ee 
0 1 0 110 Be ce tl Be 3) wee ce Nl mT 
Word 1 0 0 000 V V V V a) ea ee 
1 0 0 100 — — — = V V V V 
Double word 0 0 0 000 V V V Vv | Vv V V 


























Notes: These entries indicate the byte portions of the requested operand that are read or written during that bus 
transaction. 
These entries are not required and are ignored during read transactions and are driven with unde ned data dur ing 
all write transactions. 


The MPC750 supports misaligned memory operations, although their use may 
substantially degrade performance. Misaligned memory transfers address memory that is 
not aligned to the size of the data being transferred (such as, a word read of an odd byte 
address). Although most of these operations hit in the primary cache (or generate burst 
memory operations if they miss), the MPC750 interface supports misaligned transfers 
within a word (32-bit aligned) boundary, as shown in Table 8-4. Note that the 4-byte 
transfer in Table 8-4 is only one example of misalignment. As long as the attempted transfer 
does not cross a word boundary, the MPC750 can transfer the data on the misaligned 
address (for example, a half-word read from an odd byte-aligned address). An attempt to 
address data that crosses a word boundary requires two bus transfers to access the data. 


Due to the performance degradations associated with misaligned memory operations, they 
are best avoided. In addition to the double-word straddle boundary condition, the address 
translation logic can generate substantial exception overhead when the load/store multiple 
and load/store string instructions access misaligned data. It is strongly recommended that 
software attempt to align data where possible. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Address Bus Tenure 


Table 8-4. Misaligned Data Transfers (Four-Byte Examples) 






































Transfer Size Data Bus Byte Lanes 
(Four Bytes) TSIZ[O-2] | A[29-31] 
y 0/1/21] 3 7 
Aligned 100 000 A A A A — 
Misaligned— rst access 011 001 A A A — 
second access 001 100 SS) rey Nhe, = 
Misaligned— rst access 010 010 — | — A A — 
second access 010 10 ——. Well) As lth — = 
Misaligned— rst access 001 011 — }—}]— A —_ 
second access 011 100 Ne hee ~~ = 
Aligned 100 100 — }—]}]—] — 
Misaligned— rst access 011 101 — }—}]—|]— A 
second access 001 000 A Fo he = 
Misaligned— rst access 010 110 — }—}]—|]— A 
second access O00 000 A A onl in oe 
Misaligned— rst access 001 111 — }—}]—]— A 
second access 011 000 A A A mah earn | | en | 



































Notes: 


A: Byte lane used 
—'Byte lane not used 


8.3.2.4.1. Alignment of External Control Instructions 


The size of the data transfer associated with the eciwx and ecowx instructions is always 
4 bytes. If the eciwx or ecowx instruction is misaligned and crosses any word boundary, the 
MPC750 will generate an alignment exception. 


8.3.3. Address Transfer Termination 


The address tenure of a bus operation is terminated when completed with the assertion of 
AACK, or retried with the assertion of ARTRY. The MPC750 does not terminate the 
address transfer until the AACK (address acknowledge) input is asserted; therefore, the 
system can extend the address transfer phase by delaying the assertion of AACK to the 
MPC750. The assertion of AACK can be as early as the bus clock cycle following TS (see 
Figure 8-7), which allows a minimum address tenure of two bus cycles. As shown in 
Figure 8-7, these signals are asserted for one bus clock cycle, three-stated for half of the 
next bus clock cycle, driven high till the following bus cycle, and finally three-stated. Note 
that AACK must be asserted for only one bus clock cycle. 
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The address transfer can be terminated with the requirement to retry if ARTRY is asserted 
anytime during the address tenure and through the cycle following AACK. The assertion 
causes the entire transaction (address and data tenure) to be rerun. As a snooping device, 
the MPC750 asserts ARTRY for a snooped transaction that hits modified data in the data 
cache that must be written back to memory, or if the snooped transaction could not be 
serviced. As a bus master, the MPC750 responds to an assertion of ARTRY by aborting the 
bus transaction and re-requesting the bus. Note that after recognizing an assertion of 
ARTRY and aborting the transaction in progress, the MPC750 is not guaranteed to run the 
same transaction the next time it is granted the bus due to internal reordering of load and 
store operations. 














If an address retry is required, the ARTRY response will be asserted by a bus snooping 
device as early as the second cycle after the assertion of TS. Once asserted, ARTRY must 
remain asserted through the cycle after the assertion of AACK. The assertion of ARTRY 
during the cycle after the assertion of AACK is referred to as a qualified ARTRY. An earlier 
assertion of ARTRY during the address tenure is referred to as an early ARTRY. 























As a bus master, the MPC750 recognizes either an early or qualified ARTRY and prevents 
the data tenure associated with the retried address tenure. If the data tenure has already 
begun, the MPC750 aborts and terminates the data tenure immediately even if the burst data 
has been received. If the assertion of ARTRY is received up to or on the bus cycle following 
the first (or only) assertion of TA for the data tenure, the MPC750 ignores the first data beat, 
and if it is a load operation, does not forward data internally to the cache and execution 
units. If ARTRY is asserted after the first (or only) assertion of TA, improper operation of 
the bus interface may result. 


During the clock of a qualified ARTRY, the MPC750 also determines if it should negate BR 
and ignore BG on the following cycle. On the following cycle, only the snooping master 
that asserted ARTRY and needs to perform a snoop copy-back operation is allowed to assert 
BR. This guarantees the snooping master an opportunity to request and be granted the bus 
before the just-retried master can restart its transaction. Note that a nonclocked bus arbiter 
may detect the assertion of address bus request by the bus master that asserted ARTRY, and 
return a qualified bus grant one cycle earlier than shown in Figure 8-7. 











Note that if the MPC750 asserts ARTRY due to a snoop operation, and asserts BR in the 
bus cycle following ARTRY in order to perform a snoop push to memory it may be several 
bus cycles later before the MPC750 will be able to accept a BG. (The delay in responding 
to the assertion of BG only occurs during snoop pushes from the L2 cache.) The bus arbiter 
should keep BG asserted until it detects BR negated or TS asserted from the MPC750 
indicating that the snoop copy-back has begun. The system should ensure that no other 
address tenures occur until the current snoop push from the MPC750 is completed. Snoop 
push delays can also be avoided by operating the L2 cache in write-through mode so no 
snoop pushes are required by the L2 cache. 
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Figure 8-7. Snooped Address Cycle with ARTRY 


8.4 Data Bus Tenure 


This section describes the data bus arbitration, transfer, and termination phases defined by 
the MPC750 memory access protocol. The phases of the data tenure are identical to those 
of the address tenure, underscoring the symmetry in the control of the two buses. 


8.4.1. Data Bus Arbitration 


Data bus arbitration uses the data arbitration signal group—DBG, DBWO, and DBB. 
Additionally, the combination of TS and TT[0-4] provides information about the data bus 
request to external logic. 





The TS signal is an implied data bus request from the MPC750; the arbiter must qualify TS 
with the transfer type (TT) encodings to determine if the current address transfer is an 
address-only operation, which does not require a data bus transfer (see Figure 8-7). If the 
data bus is needed, the arbiter grants data bus mastership by asserting the DBG input to the 
MPC750. As with the address bus arbitration phase, the MPC750 must qualify the DBG 
input with a number of input signals before assuming bus mastership, as shown in 
Figure 8-8. 
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Figure 8-8. Data Bus Arbitration 


A qualified data bus grant can be expressed as the following: 


QDBG = DBG asserted while DBB, DRTRY, and ARTRY (associated with the data 
bus operation) are negated. 





When a data tenure overlaps with its associated address tenure, a qualified ARTRY 
assertion coincident with a data bus grant signal does not result in data bus mastership 
(DBB is not asserted). Otherwise, the MPC750 always asserts DBB on the bus clock cycle 
after recognition of a qualified data bus grant. Since the MPC750 can pipeline transactions, 
there may be an outstanding data bus transaction when a new address transaction is retried. 
In this case, the MPC750 becomes the data bus master to complete the previous transaction. 








8.4.1.1. Using the DBB Signal 


The DBB signal should be connected between masters if data tenure scheduling is left to 
the masters. Optionally, the memory system can control data tenure scheduling directly 
with DBG. However, it is possible to ignore the DBB signal in the system if the DBB input 
is not used as the final data bus allocation control between data bus masters, and if the 
memory system can track the start and end of the data tenure. If DBB is not used to signal 
the end of a data tenure, DBG is only asserted to the next bus master the cycle before the 
cycle that the next bus master may actually begin its data tenure, rather than asserting it 
earlier (usually during another master’s data tenure) and allowing the negation of DBB to 
be the final gating signal for a qualified data bus grant. Even if DBB is ignored in the 
system, the MPC750 always recognizes its own assertion of DBB, and requires one cycle 
after data tenure completion to negate its own DBB before recognizing a qualified data bus 
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grant for another data tenure. If DBB is ignored in the system, it must still be connected to 
a pull-up resistor on the MPC750 to ensure proper operation. 


8.4.2. Data Bus Write Only 


As a result of address pipelining, the MPC750 may have up to two data tenures queued to 
perform when it receives a qualified DBG. Generally, the data tenures should be performed 
in strict order (the same order) as their address tenures were performed. The MPC750, 
however, also supports a limited out-of-order capability with the data bus write only 
(DBWO) input. When recognized on the clock of a qualified DBG, DBWO may direct the 
MPC750 to perform the next pending data write tenure even if a pending read tenure would 
have normally been performed first. For more information on the operation of DB WO, refer 
to Section 8.10, “Using Data Bus Write Only.” 

















If the MPC750 has any data tenures to perform, it always accepts data bus mastership to 
perform a data tenure when it recognizes a qualified DBG. If DBWO is asserted with a 
qualified DBG and no write tenure is queued to run, the MPC750 still takes mastership of 
the data bus to perform the next pending read data tenure. 











Generally, DBWO should only be used to allow a copy-back operation (burst write) to 
occur before a pending read operation. If DBWO is used for single-beat write operations, 
it may negate the effect of the eieio instruction by allowing a write operation to precede a 
program-scheduled read operation. 





8.4.3 Data Transfer 


The data transfer signals include DH[0-31], DL[O—31], and DP[O—7]. For memory 
accesses, the DH and DL signals form a 64-bit data path for read and write operations. 


The MPC750 transfers data in either single- or four-beat burst transfers. Single-beat 
operations can transfer from 1 to 8 bytes at a time and can be misaligned; see 
Section 8.3.2.4, “Effect of Alignment in Data Transfers.” Burst operations always transfer 
eight words and are aligned on eight-word address boundaries. Burst transfers can achieve 
significantly higher bus throughput than single-beat operations. 


The type of transaction initiated by the MPC750 depends on whether the code or data is 
cacheable and, for store operations whether the cache is in write-back or write-through 
mode, which software controls on either a page or block basis. Burst transfers support 
cacheable operations only; that is, memory structures must be marked as cacheable (and 
write-back for data store operations) in the respective page or block descriptor to take 
advantage of burst transfers. 


The MPC750 output TBST indicates to the system whether the current transaction is a 
single- or four-beat transfer (except during eciwx/ecowx transactions, when it signals the 
state of EAR[28]). A burst transfer has an assumed address order. For load or store 
operations that miss in the cache (and are marked as cacheable and, for stores, write-back 
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in the MMU), the MPC750 uses the double-word-aligned address associated with the 
critical code or data that initiated the transaction. This minimizes latency by allowing the 
critical code or data to be forwarded to the processor before the rest of the cache line is 
filled. For all other burst operations, however, the cache line is transferred beginning with 
the eight-word-aligned data. 


8.4.4 Data Transfer Termination 


Four signals are used to terminate data bus transactions—TA, DRTRY (data retry), TEA 
(transfer error acknowledge), and ARTRY. The TA signal indicates normal termination of 
data transactions. It must always be asserted on the bus cycle coincident with the data that 
itis qualifying. It may be withheld by the slave for any number of clocks until valid data is 
ready to be supplied or accepted. DRTRY indicates invalid read data in the previous bus 
clock cycle. DRTRY extends the current data beat and does not terminate it. If it is asserted 
after the last (or only) data beat, the MPC750 negates DBB but still considers the data beat 
active and waits for another assertion of TA. DRTRY is ignored on write operations. TEA 
indicates a nonrecoverable bus error event. Upon receiving a final (or only) termination 
condition, the MPC750 always negates DBB for one cycle. 











If DRTRY is asserted by the memory system to extend the last (or only) data beat past the 
negation of DBB, the memory system should three-state the data bus on the clock after the 
final assertion of TA, even though it will negate DRTRY on that clock. This is to prevent a 
potential momentary data bus conflict if a write access begins on the following cycle. 








The TEA signal is used to signal a nonrecoverable error during the data transaction. It may 
be asserted on any cycle during DBB, or on the cycle after a qualified TA during a read 
operation, except when no-DRTRY mode is selected (where no-DRTRY mode cancels 
checking the cycle after TA). The assertion of TEA terminates the data tenure immediately 
even if in the middle of a burst; however, it does not prevent incorrect data that has just been 
acknowledged with TA from being written into the MPC750’s cache or GPRs. The 
assertion of TEA initiates either a machine check exception or a checkstop condition based 
on the setting of the MSR[ME] bit. 











An assertion of ARTRY causes the data tenure to be terminated immediately if the ARTRY 
is for the address tenure associated with the data tenure in operation. If ARTRY is 
connected for the MPC750, the earliest allowable assertion of TA to the MPC750 is directly 
dependent on the earliest possible assertion of ARTRY to the MPC750; see Section 8.3.3, 
“Address Transfer Termination.” 


8.4.4.1. Normal Single-Beat Termination 


Normal termination of a single-beat data read operation occurs when TA is asserted by a 
responding slave. The TEA and DRTRY signals must remain negated during the transfer 
(see Figure 8-9). 
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Figure 8-9. Normal Single-Beat Read Termination 


The DRTRY signal is not sampled during data writes, as shown in Figure 8-10. 





Figure 8-10. Normal Single-Beat Write Termination 
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8.4.4.2 Normal Burst Termination 


Normal termination of a burst transfer occurs when TA is asserted for four bus clock cycles, 
as shown in Figure 8-11. The bus clock cycles in which TA is asserted need not be 
consecutive, thus allowing pacing of the data transfer beats. For read bursts to terminate 
successfully, TEA and DRTRY must remain negated during the transfer. For write bursts, 
TEA must remain negated for a successful transfer. DRTRY is ignored during data writes. 











Figure 8-11. Normal Burst Transaction 


For read bursts, DRTRY may be asserted one bus clock cycle after TA is asserted to signal 
that the data presented with TA is invalid and that the processor must wait for the negation 
of DRTRY before forwarding data to the processor (see Figure 8-12). Thus, a data beat can 
be terminated by a predicted branch with TA and then one bus clock cycle later confirmed 
with the negation of DRTRY. The DRTRY signal is valid only for read transactions. TA 
must be asserted on the bus clock cycle before the first bus clock cycle of the assertion of 
DRTRY; otherwise the results are undefined. 


The DRTRY signal extends data bus mastership such that other processors cannot use the 
data bus until DRTRY is negated. Therefore, in the example in Figure 8-12, DBB cannot 
be asserted until bus clock cycle 6. This is true for both read and write operations even 
though DRTRY does not extend bus mastership for write operations. 
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Figure 8-12. Termination with DRTRY 


Figure 8-13 shows the effect of using DRTRY during a burst read. It also shows the effect 
of using TA to pace the data transfer rate. Notice that in bus clock cycle 3 of Figure 8-13, 
TA is negated for the second data beat. The MPC750 data pipeline does not proceed until 
bus clock cycle 4 when the TA is reasserted. 





| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 
reg eee eee ee ee oe 
TS : 
1S : i : ' : ' : : 
qual DBCS | | | | | | 
DBB Lia 


+ 
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Figure 8-13. Read Burst with TA Wait States and DRTRY 


Note that DRTRY is useful for systems that implement predicted forwarding of data such 
as those with direct-mapped, third-level caches where hit/miss is determined on the 
following bus clock cycle, or for parity- or ECC-checked memory systems. 


Note that DRTRY may not be implemented on other processors that implement the 
PowerPC architecture. 
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8.4.4.3. Data Transfer Termination Due to a Bus Error 


The TEA signal indicates that a bus error occurred. It may be asserted while DBB (and/or 
DRTRY for read operations) is asserted. Asserting TEA to the MPC750 terminates the 
transaction; that is, further assertions of TA and DRTRY are ignored and DBB is negated. 








Assertion of the TEA signal causes a machine check exception (and possibly a checkstop 
condition within the MPC750). For more information, see Section 4.5.2, “Machine Check 
Exception (0x00200).” Note also that the MPC750 does not implement a synchronous error 
capability for memory accesses. This means that the exception instruction pointer saved 
into the SRRO register does not point to the memory operation that caused the assertion of 
TEA, but to the instruction about to be executed (perhaps several instructions later). 
However, assertion of TEA does not invalidate data entering the GPR or the cache. 
Additionally, the address corresponding to the access that caused TEA to be asserted is not 
latched by the MPC750. To recover, the exception handler must determine and remedy the 
cause of the TEA, or the MPC750 must be reset; therefore, this function should only be 
used to indicate fatal system conditions to the processor (such as parity or uncorrectable 
ECC errors). 








After the MPC750 has committed to run a transaction, that transaction must eventually 
complete. Address retry causes the transaction to be restarted; TA wait states and DRTRY 
assertion for reads delay termination of individual data beats. Eventually, however, the 
system must either terminate the transaction or assert the TEA signal. For this reason, care 
must be taken to check for the end of physical memory and the location of certain system 
facilities to avoid memory accesses that result in the assertion of TEA. 





Note that TEA generates a machine check exception depending on MSR[ME]. Clearing 
the machine check exception enable control bits leads to a true checkstop condition 
(instruction execution halted and processor clock stopped). 


8.4.5 Memory Coherency—MEI Protocol 


The MPC750 provides dedicated hardware to provide memory coherency by snooping bus 
transactions. The address retry capability enforces the three-state, MEI cache-coherency 
protocol (see Figure 8-14). 


The global (GBL) output signal indicates whether the current transaction must be snooped 
by other snooping devices on the bus. Address bus masters assert GBL to indicate that the 
current transaction is a global access (that is, an access to memory shared by more than one 
device). If GBL is not asserted for the transaction, that transaction is not snooped. When 
other devices detect the GBL input asserted, they must respond by snooping the broadcast 
address. 











Normally, GBL reflects the M bit value specified for the memory reference in the 
corresponding translation descriptor(s). Note that care must be taken to minimize the 
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number of pages marked as global, because the retry protocol discussed in the previous 
section is used to enforce coherency and can require significant bus bandwidth. 


When the MPC750 is not the address bus master, GBL is an input. The MPC750 snoops a 
transaction if TS and GBL are asserted together in the same bus clock cycle (this is a 
qualified snooping condition). No snoop update to the MPC750 cache occurs if the snooped 
transaction is not marked global. This includes invalidation cycles. 


When the MPC750 detects a qualified snoop condition, the address associated with the TS 
is compared against the data cache tags. Snooping completes if no hit is detected. If, 
however, the address hits in the cache, the MPC750 reacts according to the MEI protocol 
shown in Figure 8-14, assuming the WIM bits are set to write-back, caching-allowed, and 
coherency-enforced modes (WIM = 001). 


SH/CR SH/CRW 


WH 
SH -(y) 
RH RH 
WH SHICIR 
BUS TRANSACTIONS 

SH =Snoop Hit W= Snoop Push 
RH =Read Hit 
WH =Write Hit (A)= Cache Line Fill 
WM=Write Miss 
RM =Read Miss 


SH/CRW=Snoop Hit, Cacheable Read/Write 
SH/CIR =Snoop Hit, Caching-Inhibited Read 


Figure 8-14. MEI Cache Coherency Protocol—State Diagram (WIM = 001) 
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8.5 Timing Examples 


This section shows timing diagrams for various scenarios. Figure 8-15 illustrates the fastest 
single-beat reads possible for the MPC750. This figure shows both minimal latency and 
maximum single-beat throughput. By delaying the data bus tenure, the latency increases, 
but, because of split-transaction pipelining, the overall throughput is not affected unless the 
data bus latency causes the third address tenure to be delayed. 


Note that all bidirectional signals are three-stated between bus tenures. 


| 1 | 2] 3 | 4 ]5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 
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Figure 8-15. Fastest Single-Beat Reads 
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Figure 8-16 illustrates the fastest single-beat writes supported by the MPC750. All 
bidirectional signals are three-stated between bus tenures. 


| 1] 2] 3 | 4 ]5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 








} 1 7 24, 34, 44,5 | 6 | 7 |] BY M9 | 10 4 11 | 12 =| 
Figure 8-16. Fastest Single-Beat Writes 


Figure 8-17 shows three ways to delay single-beat reads showing data-delay controls: 
¢ The TA signal can remain negated to insert wait states in clock cycles 3 and 4. 
* For the second access, DBG could have been asserted in clock cycle 6. 
¢ In the third access, DRTRY is asserted in clock cycle 11 to flush the previous data. 
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Note that all bidirectional signals are three-stated between bus tenures. The pipelining 
shown in Figure 8-17 can occur if the second access is not another load (for example, an 
instruction fetch). 
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| | | | | | | 
Al0-31] 
| | | | | | | | | 


TT[O-4] 
| | | | | | | | | | 
TBST | | | | | | | | | | | | | 





} 1 7, 24, 3, 445 | 6 | 7 |] BY Bd | 10 | 11 | 12 | 13 4 14 | 


Figure 8-17. Single-Beat Reads Showing Data-Delay Controls 


Figure 8-18 shows data-delay controls in a single-beat write operation. Note that all 
bidirectional signals are three-stated between bus tenures. Data transfers are delayed in the 
following ways: 

¢ The TA signal is held negated to insert wait states in clocks 3 and 4. 

¢ Inclock 6, DBG is held negated, delaying the start of the data tenure. 
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The last access is not delayed (DRTRY is valid only for read operations). 


| 1 | 2] 3 | 4 |] 5 | 6 | 7 | 8 | 9 {| 10 | 11 | 12 | 





; 1, 24) 3 | 445 | 6 | 7 | BY M9 4 10 | 11 | 12 =| 
Figure 8-18. Single-Beat Writes Showing Data Delay Controls 


Figure 8-19 shows the use of data-delay controls with burst transfers. Note that all 
bidirectional signals are three-stated between bus tenures. Note the following: 


e The first data beat of bursted read data (clock 0) is the critical quad word. 

¢ The write burst shows the use of TA signal negation to delay the third data beat. 
° The final read burst shows the use of DRTRY on the third data beat. 

e The address for the third transfer is delayed until the first transfer completes. 
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;} 17}, 24, 34, 44,5 | 6 | 7 | BY 9 | 10 4 11 | 12 | 
Figure 8-19. Burst Transfers with Data Delay Controls 


Figure 8-20 shows the use of the TEA signal. Note that all bidirectional signals are 
three-stated between bus tenures. Note the following: 


e The first data beat of the read burst (in clock 0) is the critical quad word. 
¢ The TEA signal truncates the burst write transfer on the third data beat. 
¢ The MPC750 eventually causes an exception to be taken on the TEA event. 
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Figure 8-20. Use of Transfer Error Acknowledge (TEA) 


8.6 No-DRTRY Mode 


The MPC750 supports an optional bus configuration that is selected by the assertion or 
negation of the DRTRY signal during the negation of the HRESET signal. The operation 
and selection of the optional bus configuration is described in the following sections. 


The MPC750 supports an optional mode to disable the use of the data retry function 
provided through the DRTRY signal. The no-DRTRY mode allows the forwarding of data 
during load operations to the internal CPU one bus cycle sooner than in the normal bus 
protocol. 
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The 60x bus protocol specifies that, during load operations, the memory system normally 
has the capability to cancel data that was read by the master on the bus cycle after TA was 
asserted. In the MPC750 implementation, this late cancellation protocol requires the 
MPC750 to hold any loaded data at the bus interface for one additional bus clock to verify 
that the data is valid before forwarding it to the internal CPU. For systems that do not 
implement the DRTRY function, the MPC750 provides an optional no-DRTRY mode that 
eliminates this one-cycle stall during all load operations, and allows for the forwarding of 
data to the internal CPU immediately when TA is recognized. 


When the MPC750 is in the no-DRTRY mode, data can no longer be cancelled the cycle 
after it is acknowledged by an assertion of TA. Data is immediately forwarded to the CPU 
internally, and any attempt at late cancellation by the system may cause improper operation 
by the MPC750. 








When the MPC750 is following normal bus protocol, data may be cancelled the bus cycle 
after TA by either of two means—late cancellation by DRTRY, or late cancellation by 
ARTRY. When no-DRTRY mode is selected, both cancellation cases must be disallowed 
in the system design for the bus protocol. 


When no-DRTRY mode is selected for the MPC750, the system must ensure that DRTRY 
is not asserted to the MPC750. If it is asserted, it may cause improper operation of the bus 
interface. The system must also ensure that an assertion of ARTRY by a snooping device 
must occur before or coincident with the first assertion of TA to the MPC750, but not on 
the cycle after the first assertion of TA. 








Other than the inability to cancel data that was read by the master on the bus cycle after TA 
was asserted, the bus protocol for the MPC750 is identical to that for the basic transfer bus 
protocols described in this chapter. 


The MPC750 selects the desired DRTRY mode at startup by sampling the state of the 
DRTRY signal itself at the negation of the HRESET signal. If the DRTRY signal is negated 
at the negation of HRESET, normal operation is selected. If the DRTRY signal is asserted 
at the negation of HRESET, no-DRTRY mode is selected. 














8.7 Interrupt, Checkstop, and Reset Signal Operation 


This section describes external interrupts, checkstop operations, and hard and soft reset 
inputs. 


8.7.1. External Interrupts 


The external interrupt input signals (INT, SMI and MCP) of the MPC750 force the 
processor to take the external interrupt vector or the system management interrupt vector if 
the MSR[EEF] is set, or the machine check interrupt if the MSR[ME] and the HIDO[EMCP] 
bits are set. 
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8.7.2 Checkstops 


The MPC750 has two checkstop input signals—CKSTP_IN (nonmaskable) and MCP 
(enabled when MSR[ME] is cleared, and HIDO[EMCP] is set), and a checkstop output 
(CKSTP_OUT) signal. If CKSTP_IN or MCP is asserted, the MPC750 halts operations by 
gating off all internal clocks. The MPC750 asserts CKSTP_OUT if CKSTP_IN is asserted. 


If CKSTP_OUT is asserted by the MPC750, it has entered the checkstop state, and 
processing has halted internally. The CKSTP_OUT signal can be asserted for various 
reasons including receiving a TEA signal and detection of external parity errors. For more 
information about checkstop state, see Section 4.5.2.2, “Checkstop State (MSR[ME] = 0).” 








8.7.3 Reset Inputs 


The MPC750 has two reset inputs, described as follows: 


¢ HRESET (hard reset)—The HRESET signal is used for power-on reset sequences, 
or for situations in which the MPC750 must go through the entire cold start sequence 
of internal hardware initializations. 


¢ SRESET (soft reset)—The soft reset input provides warm reset capability. This 
input can be used to avoid forcing the MPC750 to complete the cold start sequence. 


When either reset input negates, the processor attempts to fetch code from the system reset 
exception vector. The vector is located at offset 0x00100 from the exception prefix (all 
zeros or ones, depending on the setting of the exception prefix bit in the machine state 
register (MSR[IP]). The MSR[IP] bit is set for HRESET. 


8.7.4 System Quiesce Control Signals 


The system quiesce control signals (QREQ and QACK) allow the processor to enter the nap 
or sleep low-power states, and bring bus activity to a quiescent state in an orderly fashion. 


Prior to entering the nap or sleep power state, the MPC750 asserts the QREQ signal. This 
signal allows the system to terminate or pause any bus activities that are normally snooped. 
When the system is ready to enter the system quiesce state, it asserts the QACK signal. At 
this time the MPC750 may enter a quiescent (low power) state. When the MPC750 is in the 
quiescent state, it stops snooping bus activity. While the MPC750 is in the nap power state, 
the system power controller can enable snooping by the MPC750 by deasserting the QACK 
signal for at least eight bus clock cycles, after which the MPC750 is capable of snooping 
bus transactions. The reassertion of QACK following the snoop transactions will cause the 
MPC750 to reenter the nap power state. 








8.8 Processor State Signals 


This section describes the MPC750's support for atomic update and memory through the 
use of the lwarx/stwex. opcode pair, and includes a description of the TLBISYNC input. 
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8.8.1. Support for the lwarx/stwcx. Instruction Pair 


The Load Word and Reserve Indexed (Iwarx) and the Store Word Conditional Indexed 
(stwex.) instructions provide a means for atomic memory updating. Memory can be 
updated atomically by setting a reservation on the load and checking that the reservation is 
still valid before the store is performed. In the MPC750, the reservations are made on behalf 
of aligned, 32-byte sections of the memory address space. 


The reservation (RSRV) output signal is driven synchronously with the bus clock and 
reflects the status of the reservation coherency bit in the reservation address register; see 
Chapter 3, “L1 Instruction and Data Cache Operation,’ for more information. For 
information about timing, see Section 7.2.9.7.3, “Reservation (RSRV)—Output.” 


8.8.2  TLBISYNC Input 


The TLBISYNC input allows for the hardware synchronization of changes to MMU tables 
when the MPC750 and another DMA master share the same MMU translation tables in 
system memory. It is asserted by a DMA master when it is using shared addresses that could 
be changed in the MMU tables by the MPC750 during the DMA master’s tenure. 


The TLBISYNC input, when asserted to the MPC750, prevents the MPC750 from 
completing any instructions past a tlbsync instruction. Generally, during the execution of 
an eciwx or ecowx instruction by the MPC750, the selected DMA device should assert the 
MPC750’s TLBISYNC signal and maintain it asserted during its DMA tenure if it is using 
a shared translation address. Subsequent instructions by the MPC750 should include a syne 
and tlbsync instruction before any MMU table changes are performed. This will prevent 
the MPC750 from making table changes disruptive to the other master during the DMA 
period. 


8.9 IEEE 1149.1a-1993 Compliant Interface 


The MPC750 boundary-scan interface is a fully-compliant implementation of the IEEE 
1149.1a-1993 standard. This section describes the MPC750’s IEEE 1149.1a-1993 (JTAG) 
interface. 


8.9.1 JTAG/COP Interface 


The MPC750 has extensive on-chip test capability including the following: 
¢ Debug control/observation (COP) 
¢ Boundary scan (standard IEEE 1149.1a-1993 (JTAG) compliant interface) 
e Support for manufacturing test 


The COP and boundary scan logic are not used under typical operating conditions. Detailed 
discussion of the MPC750 test functions is beyond the scope of this document; however, 
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sufficient information has been provided to allow the system designer to disable the test 
functions that would impede normal operation. 


The JTAG/COP interface is shown in Figure 8-21. For more information, refer to [EEE 
Standard Test Access Port and Boundary Scan Architecture IEEE STD 1149-1a-1993. 


TDI (Test Data Input) 
TMS (Test Mode Select) 
TCK (Test Clock Input) 


TDO (Test Data Output) 


TRST (Test Reset) 





Figure 8-21. IEEE 1149.1a-1993 Compliant Boundary Scan Interface 


8.10 Using Data Bus Write Only 


The MPC750 supports split-transaction pipelined transactions. It supports a limited 
out-of-order capability for its own pipelined transactions through the data bus write only 
(DBWO) signal. When recognized on the clock of a qualified DBG, the assertion of DBWO 
directs the MPC750 to perform the next pending data write tenure (if any), even if a pending 
read tenure would have normally been performed because of address pipelining. The 
DBWO signal does not change the order of write tenures with respect to other write tenures 
from the same MPC750. It only allows that a write tenure be performed ahead of a pending 
read tenure from the same MPC750. 














In general, an address tenure on the bus is followed strictly in order by its associated data 
tenure. Transactions pipelined by the MPC750 complete strictly in order. However, the 
MPC750 can run bus transactions out of order only when the external system allows the 
MPC750 to perform a cache-line-snoop-push-out operation (or other write transaction, if 
pending in the MPC750 write queues) between the address and data tenures of a read 
operation through the use of DBWO. This effectively envelopes the write operation within 
the read operation. Figure 8-22 shows how the DBWO signal is used to perform an 
enveloped write transaction. 
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Figure 8-22. Data Bus Write Only Transaction 


Note that although the MPC750 can pipeline any write transaction behind the read 
transaction, special care should be used when using the enveloped write feature. It is 
envisioned that most system implementations will not need this capability; for these 
applications, DBWO should remain negated. In systems where this capability is needed, 
DBWO should be asserted under the following scenario: 

1. The MPC750 initiates a read transaction (either single-beat or burst) by completing 
the read address tenure with no address retry. 

2. Then, the MPC750 initiates a write transaction by completing the write address 
tenure, with no address retry. 

3. At this point, if DBWO is asserted with a qualified data bus grant to the MPC750, 
the MPC750 asserts DBB and drives the write data onto the data bus, out of order 
with respect to the address pipeline. The write transaction concludes with the 
MPC750 negating DBB. 

4. The next qualified data bus grant signals the MPC750 to complete the outstanding 
read transaction by latching the data on the bus. This assertion of DBG should not 
be accompanied by an asserted DBWO. 




















Any number of bus transactions by other bus masters can be attempted between any of these 
steps. 


Note the following regarding DBWO: 
¢ DBWO can be asserted if no data bus read is pending, but it has no effect on write 
ordering. 
e The ordering and presence of data bus writes is determined by the writes in the write 


queues at the time BG is asserted for the write address (not DBG). If a particular 
write is desired (for example, a cache-line-snoop-push-out operation), then BG must 
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be asserted after that particular write is in the queue and it must be the highest 
priority write in the queue at that time. A cache-line-snoop-push-out operation may 
be the highest priority write, but more than one may be queued. 


¢ Because more than one write may be in the write queue when DBG is asserted for 
the write address, more than one data bus write may be enveloped by a pending data 
bus read. 


The arbiter must monitor bus operations and coordinate the various masters and slaves with 
respect to the use of the data bus when DBWO is used. Individual DBG signals associated 
with each bus device should allow the arbiter to synchronize both pipelined and 
split-transaction bus organizations. Individual DBG and DBWO signals provide a primitive 
form of source-level tagging for the granting of the data bus. 





Note that use of the DBWO signal allows some operation-level tagging with respect to the 
MPC750 and the use of the data bus. 
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Chapter 9 
L2 Cache Interface Operation 


This chapter describes the MPC750 microprocessor L2 cache interface, and its 
configuration and operation. It describes how the MPC750 signals, defined in Chapter 7, 
“Signal Descriptions,” interact to perform address and data transfers to and from the L2 
cache. Note that the MPC740 microprocessor does not implement the L2 cache interface. 
Also, note that the MPC755 microprocessor is a derivative of the MPC750 and all 
descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, 
“MPC755 Embedded G3 Microprocessor.” 


9.1 L2 Cache Interface Overview 


The MPC750’s L2 cache interface is implemented with an on-chip, two-way set associative 
tag memory with 4096 tags per way, and a dedicated interface with support for up to 
1 Mbyte of external synchronous SRAM for data storage. The tags are sectored to support 
either two cache blocks per tag entry (two sectors, 64 bytes), or four cache blocks per tag 
entry (four sectors, 128 bytes) depending on the L2 cache size. If the L2 cache is configured 
for 256 Kbytes or 512 Kbytes of external SRAM, the tags are configured for two sectors 
per L2 cache block. The L2 tags are configured for four sectors per L2 cache block when 
1 Mbyte of external SRAM is used. Each sector (32-byte L1 cache block) in the L2 cache 
has its own valid and modified bits and other status bits that implement the MEI cache 
coherency protocol. 


The L2 cache control register (L2CR) allows control of L2 cache configuration and timing, 
byte-level data parity generation and checking, global invalidation of L2 contents, 
write-through operation, and L2 test support. The L2 cache interface provides two clock 
outputs that allow the clock inputs of the SRAMs to be driven at select frequency divisions 
of the processor core frequency. The MPC750’s L2 cache normally is configured to operate 
in copy-back mode and maintains cache coherency through snooping. 


Figure 9-1 shows the MPC750 configured with a 1-Mbyte L2 cache. 
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Notes: 

— Fora 1-Mbyte L2, use address bits 16—0 (bit 0 is LSB). 

— For a 512-Kbyte L2, use address bits 15—0 (bit 0 is LSB). 

— For a 256-Kbyte L2, use address bits 14—0 (bit 0 is LSB). 

— External clock routing should ensure that the rising edge of the L2 clock is 
coincident at the K input of all SRAMs and at the L2Sync_In input of the 
MPC750. The clock A network can be used solely or the clock B network can 
also be used depending on loading, frequency, and number of SRAMs. 

—No pull-up resistors are normally required for the L2 interface. 

— The MPC750 supports only one bank of SRAMs. 

— For high-speed operation, no more than two loads should be presented on each 
L2 interface signal. 


Figure 9-1. Typical 1-Mbyte L2 Cache Configuration 


9.1.1 L2 Cache Operation 


The MPC750’s L2 cache is a combined instruction and data cache that receives memory 
requests from both L1 instruction and data caches independently. The L1 requests are 
generally the result of instruction fetch misses, data load or store misses, write-through 
operations, or cache management instructions. Each L1 request generates an address 
lookup in the L2 tags. If a hit occurs, the instructions or data are forwarded to the L1 cache. 
A miss in the L2 tags causes the L1 request to be forwarded to the 60x bus interface. The 
cache block received from the bus is forwarded to the L1 cache immediately, and is also 
loaded into the L2 cache with the tag marked valid and unmodified. If the cache block 
loaded into the L2 causes a new tag entry to be allocated and the current tag entry is marked 
valid modified, the modified sectors of the tag to be replaced are castout from the L2 cache 
to the 60x bus. 


At any given time the L1 instruction cache may have one instruction fetch request, and the 
L1 data cache may have one load and two stores requesting L2 cache access. The L2 cache 
also services snoop requests from the 60x bus. When there are multiple pending requests to 
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the L2 cache, snoop requests have highest priority, followed by data load and store requests 
(serviced on a first-in, first-out basis). Instruction fetch requests have the lowest priority in 
accessing the L2 cache when there are multiple accesses pending. 


If read requests from both the L1 instruction and data caches are pending, the L2 cache can 
perform hit-under-miss and supplies the available instruction or data while a bus transaction 
for the previous L2 cache miss is performed. The L2 cache does not support 
miss-under-miss, and the second instruction fetch or data load stalls until the bus operation 
resulting from the first L2 miss completes. 


All requests to the L2 cache that are marked cacheable (even if the respective L1 cache is 
disabled or locked) cause tag lookup and will be serviced if the instructions or data are in 
the L2 cache. Burst and single-beat read requests from the L1 caches that hit in the L2 cache 
are forwarded instructions or data, and the L2 LRU bit for that tag is updated. Burst writes 
from the L1 data cache due to a castout or replacement copyback are written only to the L2 
cache, and the L2 cache sector is marked modified. Designers should note that during burst 
transfers into and out of the L2 cache SRAM array an address is generated by the MPC750 
for each data beat. 


If the L2 cache is configured as write-through, the L2 sector is marked unmodified, and the 
write is forwarded to the 60x bus. If the L1 castout requires a new L2 tag entry to be 
allocated and the current tag is marked modified, any modified sectors of the tag to be 
replaced are cast out of the L2 cache to the 60x bus. 


Single-beat read requests from the L1 caches that miss in the L2 cache do not cause any 
state changes in the L2 cache and are forwarded on the 60x bus interface. Cacheable 
single-beat store requests marked copy-back that hit in the L2 are allowed to update the L2 
cache sector, but do not cause L2 cache sector allocation or deallocation. Cacheable, 
single-beat store requests that miss in the L2 are forwarded to the 60x bus. Single-beat store 
requests marked write-through (through address translation or through the configuration of 
L2CR[L2WT]) are written to the L2 cache if they hit and are written to the 60x bus 
independent of the L2 hit/miss status. If the store hits in the L2 cache, the 
modified/unmodified status of the tag remains unchanged. All requests to the L2 cache that 
are marked cache-inhibited by address translation (through either the MMU or by default 
WIMG configuration) bypass the L2 cache and do not cause any L2 cache tag state change. 


The execution of the stwex. instruction results in single-beat writes from the L1 data cache. 
These single-beat writes are processed by the L2 cache according to hit/miss status, L1 and 
L2 write-through configuration, and reservation-active status. If the address associated with 
the stwex. instruction misses in the L2 cache or if the reservation is no longer active, the 
stwex. instruction bypasses the L2 cache and is forwarded to the 60x bus interface. If the 
stwex. hits in the L2 cache and the reservation is still active, one of the following actions 
occurs: 
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e If the stwex. hits a modified sector in the L2 cache (independent of write-through 
status), or if the stwex. hits both the L1 and L2 caches in copy-back mode, the stwex. 
is written to the L2 and the reservation completes. 


e Ifthe stwex. hits an unmodified sector in the L2 cache, and either the L1 or L2 is in 
write-through mode, the stwex. is forwarded to the 60x bus interface and the sector 
hit in the L2 cache is invalidated. 


9.1.2 L2 Cache Flushing 


L1 cache-block-push operations generated by the execution of debf and debst instructions 
write through to the 60x bus interface and invalidate the L2 cache sector if they hit. The 
execution of debf and debst instructions that do not cause a cache-block-push from the L1 
cache are forwarded to the L2 cache to perform a sector invalidation and/or push from the 
L2 cache to the 60x bus as required. If the debf and debst instructions do not cause a sector 
push from the L2 cache, they are forwarded to the 60x bus interface for address-only 
broadcast if HIDO[ABE] is set to 1. 


The debi instruction is always forwarded to the L2 cache and causes a segment invalidation 
if a hit occurs. The debi instruction is also forwarded to the 60x bus interface for broadcast 
if HIDO[ABE] is set to 1. The icbi instruction invalidates only L1 cache blocks and is never 
forwarded to the L2 cache. Any debz instructions marked global do not affect the L2 cache 
state. If a dcbz instruction hits in the L1 and L2 caches, the L1 data cache block is cleared 
and the debz instruction completes. If a dcbz instruction misses in the L2 cache, it is 
forwarded to the 60x bus interface for broadcast. Any debz instructions that are marked 
nonglobal act only on the L1 data cache. 


The sync and eieio instructions bypass the L2 cache and are forwarded to the 60x bus. 


9.1.3 L2 Cache Control Register (L2CR) 


The L2 cache control register is used to configure and enable the L2 cache. The L2CR is a 
supervisor-level read/write, implementation-specific register that is accessed as SPR 1017. 
The contents of the L2CR are cleared during power-on reset. Table 9-1 describes the LZCR 
bits. For additional information about the configuration of the L2CR, refer to Section 2.1.5, 
“L2 Cache Control Register (L2CR).” 


Table 9-1. L2 Cache Control Register 





Bits Name Function 





0 L2E L2 enable. Enables L2 cache operation (including snooping) starting with the next transaction the 
L2 cache unit receives. Before enabling the L2 cache, the L2 clock must be con gured through 
L2CR[2CLK], and the L2 DLL must stabilize (see the hardware speci cations). All other L2CR bits 
must be set appropriately. The L2 cache may need to be invalidated globally. 





1 L2PE L2 data parity generation and checking enable. Enables parity generation and checking for the 
L2 data RAM interface. When disabled, generated parity is always zeros. 
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Table 9-1. L2 Cache Control Register (continued) 























Bits Name Function 
2-3 L2SIZ | L2 size—Should be set according to the size of the L2 data RAMs used. A 256-Kbyte L2 cache 
requires a data RAM con gur ation of 32 Kbytes x 64 bits; a 512-Kbyte L2 cache requires a 
con gur ation of 64 Kbyte x 64 bits; a 1-Mbyte L2 cache requires a con gur ation of 128K x 64 bits. 
00 Reserved 
01 256 Kbyte 
10 512 Kbyte 
11 1 Mbyte 
4-6 L2CLK |L2 clock ratio (core-to-L2 frequency divider). Speci es the cloc k divider ratio based from the core 
clock frequency that the L2 data RAM interface is to operate at. When these bits are cleared, the 
L2 clock is stopped and the on-chip DLL for the L2 interface is disabled. For nonzero values, the 
processor generates the L2 clock and the on-chip DLL is enabled. After the L2 clock ratio is 
chosen, the DLL must stabilize before the L2 interface can be enabled. (See the hardware 
speci cations). The resulting L2 clock frequency cannot be slower than the clock frequency of the 
60x bus interface. 
000 L2 clock and DLL disabled 
001 +1 
010 +1.5 
011 Reserved 
100 +2 
101 +2.5 
110 +3 
111 Reserved 
7-8 L2RAM _ |L2 RAM type—Con gures the L2 RAM interface for the type of synchronous SRAMs used: 
¢ Flow-through (register-buffer) synchronous burst SRAMs that clock addresses in and o w data 
out 
¢ Pipelined (register-register) synchronous burst SRAMs that clock addresses in and clock data 
out 
* Late-write synchronous SRAMs, for which the MPC750 requires a pipelined (register-register) 
con gur ation. Late-write RAMs require write data to be valid on the cycle after WE is asserted, 
rather than on the same cycle as the write enable as with traditional burst RAMs. 
For burst RAM selections, the MPC750 does not burst data into the L2 cache; it generates an 
address for each access. Pipelined SRAMs may be used for all L2 clock modes. Note that 
o w-through SRAMs can be used only for L2 clock modes divide-by-2 or slower (divide-by-1 and 
divide-by-1.5 not allowed). 
00 Flow-through (register-buffer) synchronous burst SRAM 
01 Reserved 
10 Pipelined (register-register) synchronous burst SRAM 
11 Pipelined (register-register) synchronous late-write SRAM 
9 L2DO_ _|L2 data-only. Setting this bit enables data-only operation in the L2 cache. For this operation, only 
transactions from the L1 data cache can be cached in the L2 cache, which treats all transactions 
from the L1 instruction cache as cache-inhibited (bypass L2 cache, no L2 checking done). This 
bit is provided for L2 testing only. 
10 L2| L2 global invalidate. Setting L2I invalidates the L2 cache globally by clearing the L2 bits including 
status bits. This bit must not be set while the L2 cache is enabled. 
11 L2CTL |L2 RAM control (ZZ enable). Setting L2CTL enables the automatic operation of the L2ZZ 








(low-power mode) signal for cache RAMs that support the ZZ function. While L2CTL is asserted, 
L2ZZ asserts automatically when the MPC750 enters nap or sleep mode and negates 
automatically when the MPC750 exits nap or sleep mode. This bit should not be set when the 
MPC750 is in nap mode and snooping is to be performed through deassertion of QACK. 
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Table 9-1. L2 Cache Control Register (continued) 


Function 





L2 write-through. Setting L2WT selects write-through mode (rather than the default write-back 
mode) so all writes to the L2 cache also write through to the 60x bus. For these writes, the L2 
cache entry is always marked as clean (valid unmodi ed) r ather than dirty (valid modi ed). This 
bit must never be asserted after the L2 cache has been enabled as previously-modi ed lines can 
get remarked as clean during normal operation. 


L2 test support. Setting L2TS causes cache block pushes from the L1 data cache that result from 
dcbf and debst instructions to be written only into the L2 cache and marked valid, rather than 
being written only to the 60x bus and marked invalid in the L2 cache in case of hit. This bit allows 
a dcbz/dcbf instruction sequence to be used with the L1 cache enabled to easily initialize the L2 
cache with any address and data information. This bit also keeps dcebz instructions from being 
broadcast on the 60x and single-beat cacheable store misses in the L2 from being written to the 
60x bus. 


L2 output hold. These bits con gure output hold time f or address, data, and control signals driven 
by the MPC750 to the L2 data RAMs. They should generally be set according to the SRAM’s input 
hold time requirements, for which late-write SRAMs usually differ from o w-through or burst 
SRAMs. 

00 0.5nS 

01 1.0nS 

1x Reserved 


L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to 
increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, 
L2SL should be set if the L2 RAM interface is operated below 150 MHz. 





L2 differential clock. Setting L2DF con gures the two clock-out signals (L2CLK_OUTA and 
L2CLK_OUTB) of the L2 interface to operate as one differential clock. In this mode, the B clock 
is driven as the logical complement of the A clock. This mode supports the differential clock 
requirements of late-write SRAMs. Generally, this bit should be set when late-write SRAMs are 
used. 





18 L2BYP |L2 DLL bypass. The DLL unit receives three input clocks: 

« A square-wave clock from the PLL unit to phase adjust and export 

« A non-square-wave clock for the internal phase reference 

¢ A feedback clock (L2SYNC_IN) for the external phase reference. 

Asserting L2BYP causes clock #2 to be used as clocks #1 and #2. (Clock #2 is the actual clock 
used by the registers of the L2 interface circuitry.) L2BYP is intended for use when the PLL is 
being bypassed, and for engineering evaluation. If the PLL is being bypassed, the DLL must be 
operated in divide-by-1 mode, and SYSCLK must be fast enough for the DLL to support. 





19-30 —_— Reserved. These bits are implemented but not used; keep at 0 for future compatibility. 





31 L2IP L2 global invalidate in progress (read only). This read-only bit indicates whether an L2 global 
invalidate is occurring. It should be monitored after an L2 global invalidate has been initiated by 
the L2I bit to determine when it has completed. 











9.1.4 L2 Cache Initialization 


Following a power-on or hard reset, the L2 cache and the L2 DLL are disabled initially. 
Before enabling the L2 cache, the L2 DLL must first be configured through the L2CR 
register, and the DLL must be allowed 640 L2 clock periods to achieve phase lock. Before 
enabling the L2 cache, other configuration parameters must be set in the L2CR, and the L2 
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tags must be globally invalidated. The L2 cache should be initialized during system 
start-up. 


The sequence for initializing the L2 cache is as follows: 


Power-on reset (automatically performed by the assertion of HRESET signal). 
Disable L2 cache by clearing L2 CR[L2E]. 


Set the L2CR[L2CLK] bits to the desired clock divider setting. Setting a nonzero 
value automatically enables the DLL. All other L2 cache configuration bits should 
be set to properly configure the L2 cache interface for the SRAM type, size, and 
interface timing required. 


Wait for the L2 DLL to achieve phase lock. This can be timed by setting the 
decrementer for a time period equal to 640 L2 clocks, or by performing an L2 global 
invalidate. 


Perform an L2 global invalidate. The global invalidate could be performed before 
enabling the DLL, or in parallel with waiting for the DLL to stabilize. Refer to 
Section 9.1.5, “L2 Cache Global Invalidation,’ for more information about L2 cache 
global invalidation. Note that a global invalidate always takes much longer than it 
takes for the DLL to stabilize. 


After the DLL stabilizes, an L2 global invalidate has been performed, and the other 
L2 configuration bits have been set, enable the L2 cache for normal operation by 
setting the L2CR[L2E] bit to 1. 


9.1.5 L2 Cache Global Invalidation 


The L2 cache supports a global invalidation function in which all bits of the L2 tags (tag 
data bits, tag status bits, and LRU bit) are cleared. It is performed by an on-chip hardware 
state machine that sequentially cycles through the L2 tags. The global invalidation function 
is controlled through L2CR[L2I]], and it must be performed only while the L2 cache is 
disabled. The MPC750 can continue operation during a global invalidation provided the L2 
cache has been properly disabled before the global invalidation operation starts. Note that 
the MPC750 must be operating at full power (low power modes disabled) in order to 
perform L2 cache invalidation. 


The sequence for performing a global invalidation of the L2 cache is as follows: 


Clear HIDO[DPM] bit to zero. Dynamic power management must be disabled. 


Execute a sync instruction to finish any pending store operations in the load/store 
unit, disable the L2 cache by clearing L2CR[L2E], and execute an additional syne 
instruction after disabling the L2 cache to ensure that any pending operations in the 
L2 cache unit have completed. 


Initiate the global invalidation operation by setting the L2CR[L2]] bit to 1. 
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¢ Monitor the L2CR[L2IP] bit to determine when the global invalidation operation is 
completed (indicated by the clearing of L2CR[L2IP]). The global invalidation 
requires approximately 32K core clock cycles to complete. 


e After detecting the clearing of L2CR[L2IP], clear L2CR[L2I] and re-enable the L2 
cache for normal operation by setting L2CR[L2E]. Also, dynamic power 
management can be enabled at this time. 


If dynamic power management is enabled (HIDO[DPM] = 1), a global invalidate of the L2 
cache may not properly invalidate the L2 tag memory during the time that the L1 data cache 
is waiting for reload data to be received from system memory. During that time, circuity in 
the L1 data cache is stopped to conserve power, which inadvertently affects the state 
machine performing the L2 global invalidate operation. 


There are two ways to avoid this: 
¢ Be sure DPM = 0 during an L2 cache global invalidation. 


e Ensure that the processor is in a tight uninterruptable software loop monitoring the 
end of the global invalidate, so that an L1 data cache miss cannot occur that would 
initiate a reload from system memory during the global invalidate operation. 


9.1.6 L2 Cache Test Features and Methods 


In the course of system power-up, testing may be required to verify the proper operation of 
the L2 tag memory, external SRAM, and overall L2 cache system. The following sections 
describe the MPC750’s features and methods for testing the L2 cache. The L2 cache 
address space should be marked as guarded (G = 1) so spurious load operations are not 
forwarded to the 60x bus interface before branch resolution during L2 cache testing. 


9.1.6.1 L2CR Support for L2 Cache Testing 


L2CR[DO] and L2CR[TS] support the testing of the L2 cache. L2CR[DO] prevents 
instructions from being cached in the L2. This allows the L1 instruction cache to remain 
enabled during the testing process without having L1 instruction misses affect the contents 
of the L2 cache and allows all L2 cache activity to be controlled by program-specified load 
and store operations. 


L2CR[TS] is used with the debf and dcbst instructions to push data into the L2 cache. 
When L2CR[TS] is set, and the L1 data cache is enabled, an instruction loop containing a 
dcbf instruction can be used to store any address or data pattern to the L2 cache. 
Additionally, 60x bus broadcasting is inhibited when a debz instruction is executed. This 
allows the use of a debz instruction to clear an L1 cache block, followed by a debf 
instruction to push the cache block into the L2 cache and invalidate the L1 cache block. 


When the L2 cache is enabled, cacheable single-beat read operations are allowed to hit in 
the L2 cache and cacheable write operations are allowed to modify the contents of the L2 
cache when a hit occurs. Cacheable single-beat read and writes occur when address 
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translation is disabled (invoking the use of the default WIMG bits (Ob0011)), or when 
address translation is enabled and accesses are marked as cacheable through the page table 
entries or the BATs, and the L1 data cache is disabled or locked. When the L2 cache has 
been initialized and the L1 cache has been disabled or locked, load or store instructions then 
bypass the L1 cache and hit in the L2 cache directly. When L2CR[TS] is set, cacheable 
single-beat writes are inhibited from accessing the 60x bus interface after an L2 cache miss. 


During L2 cache testing, the performance monitor can be used to count L2 cache hits and 
misses, thereby providing a numerical signature for test routines and a way to verify proper 
L2 cache operation. 


9.1.6.2 L2 Cache Testing 


A typical test for verifying the proper operation of the MPC750’s L2 cache memory 
(external SRAM and tag) would perform the following steps: 


e Initialize the L2 test sequence by disabling address translation to invoke the default 
WIMG setting (0b0011). Set L2CR[DO] and L2CR[TS] and perform a global 
invalidation of the L1 data cache and the L2 cache. The L1 instruction cache can 
remain enabled to improve execution efficiency. 


¢ Test the L2 cache external SRAM by enabling the L1 data cache and executing a 
sequence of debz, stw, and debf instructions to initialize the L2 cache with a desired 
range of consecutive addresses and with cache data consisting of zeros. Once the L2 
cache holds a sequential range of addresses, disable the L1 data cache and execute 
a series of single-beat load and store operations employing a variety of bit patterns 
to test for stuck bits and pattern sensitivities in the L2 cache SRAM. The 
performance monitor can be used to verify whether the number of L2 cache hits or 
misses corresponds to the tests performed. 


e Test the L2 cache tag memory by enabling the L1 data cache and executing a 
sequence of dcbz, stw, and debf instructions to initialize the L2 cache with a wide 
range of addresses and cache data. Once the L2 cache is populated with a known 
range of addresses and data, disable the L1 data cache and execute a series of store 
operations to addresses not previously in the L2 cache. These store operations 
should miss in every case. Note that setting the L2CR[TS] inhibits L2 cache misses 
from being forwarded to the 60x bus interface, thereby avoiding the potential for bus 
errors due to addressing hardware or nonexistent memory. The L2 cache then can be 
further verified by reading the previously loaded addresses and observing whether 
all the tags hit, and that the associated data compares correctly. The performance 
monitor can also be used to verify whether the proper number of L2 cache hits and 
misses correspond to the test operations performed. 


e The entire L2 cache can be tested by clearing L2CR[DO] and L2CR[TS], restoring 
the L1 and L2 caches to their normal operational state, and executing a 
comprehensive test program designed to exercise all the caches. The test program 


Chapter 9. L2 Cache Interface Operation 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
L2 Cache Interface Overview 


should include operations that cause L2 hit, reload, and castout activity that can be 
subsequently verified through the performance monitor. 


9.1.7 L2 Clock Configuration 


The MPC750 provides a programmable clock for the L2 external synchronous data RAM. 
The clock frequency for the external SRAM is provided by dividing the MPC750’s internal 
clock by ratios of 1, 1.5, 2, 2.5, or 3, programmed through the L2CR[CLK] bits. The L2 
clock is phase-adjusted to synchronize the clocking of the latches in the MPC750’s L2 
cache interface with the clocking of the external SRAM by means of an on-chip 
delay-locked loop (DLL). 


The ratio selected for the L2 clock is dependent on the frequency supported by the external 
SRAMs, the MPC750’s internal frequency of operation, and the range of phase adjustment 
supported by the L2 DLL. Refer to the MPC750 hardware specifications for additional 
information about L2 clock configuration. 


9.1.8 L2 Cache SRAM Timing Examples 


This section describes the signal timing for the three types of SRAM (flow-through burst 
SRAM, pipelined burst SRAM, and late-write SRAM) supported by the MPC750’s L2 
cache interface. The timing diagrams illustrate the best case logical (ideal, non AC-timing 
accurate) interface operations. For proper interface operation, the designer must select 
SRAMsSs that support the signal sequencing illustrated in the timing diagrams. Designers 
should also note that during burst transfers into and out of the L2 cache SRAM array, an 
address is generated by the MPC750 for each data beat. 


The SRAM selected for a system design is usually a function of desired system 
performance, L2 bus frequency, and SRAM unit cost. The following sections describe the 
operation of the three SRAM types supported by the MPC750, and the design trade-offs 
associated with each. 


9.1.8.1 Flow-Through Burst SRAM 


Flow-through burst SRAMs operate by clocking in the address, and driving the data directly 
to the bus from the SRAM memory array. This behavior allows the flow-through burst 
SRAMSs to provide initial read data one cycle sooner than pipelined burst SRAMs, but the 
flow-through burst SRAM frequencies available may only support the slowest L2 bus 
frequencies. The MPC750 supports flow-through burst SRAM at L2 clock ratios of +2, 
=2.5,and+3, 


Figure 9-2 shows a burst read-write-read memory access sequence when the L2 cache 
interface is configured with flow-through burst SRAM. 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
L2 Cache Interface Overview 


srANCIK LOLOL |. 











L2CE | | ] 
| | | | 
L2WE 
burst wr. 
SRAMAddress ( W5 X W6 X WZ X RB X RO XRIOXRWX Rue) 















































| | | 
SRAMMemory « RO_X_R1 X_R2 XR3 X Rear) ( W4 X WS X W6 XW7 X RB X_RIXR1OXR11K Ruse) 
] I I 


SRAMData 








Note: 
R,t; indicates where an extra read cycle is signaled to keep the burst RAM driving the 


data bus for the last read. 


Figure 9-2. Burst Read-Write-Read L2 Cache Access (Flow-Through) 


Figure 9-3 shows a burst read-modify-write memory access sequence when the L2 cache 
interface is configured with flow-through burst SRAM. 


SRAMCIk | [| | | | | | 















































































L2CE \ / \ / 
L2WE 
burst rd rd modify wr, 
SRAMAddress (Rye) 
SRAMMemory 
SRAMData 
Note: 


Rt; indicates where an extra read cycle is signaled to keep the burst RAM driving the 
data bus for the last read. 


Figure 9-3. Burst Read-Modify-Write L2 Cache Access (Flow-Through) 


Figure 9-4 shows a burst read-write-write memory access sequence when the L2 cache 
interface is configured with flow-through burst SRAM. 


SRAMCIk 
L2CE 
L2WE 








SRAMAddress 


















































SRAMMemory 














SRAMData 


Note: 
Rt, indicates where an extra read cycle is signaled to keep the burst RAM driving the 


data bus for the last read. 


Figure 9-4. Burst Read-Write-Write L2 Cache Access (Flow-Through) 
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9.1.8.2 Pipelined Burst SRAM 


Pipelined burst SRAMs operate at higher frequencies than flow-through burst SRAMs by 
clocking the read data from the memory array into a buffer before driving the data onto the 
data bus. This causes initial read accesses by the pipelined burst SRAMs to occur one cycle 
later than flow-through burst SRAMs, but the L2 bus frequencies supported can be higher. 
Note that the MPC750’s L2 cache interface requires the use of single-cycle deselect 
pipelined burst SRAM for proper operation. 


Figure 9-5 shows a burst read-write-read memory access sequence when the L2 cache 
interface is configured with pipelined burst SRAM. 


SRAMCIk | Lf 
L2CE ! \ / \ / 


L2WE 






burst wr burst rd 

































































SRAMAddress 
SRAMMemory «ROX RIX R2 X RS X Rui) (W4 X W5 X W6 X WZ X RB X RO XRIOXRWX Rie) 
SRAMData -; 





Notes: 
Rary indicates where some burst RAMs may begin driving the data bus. 


Ryt- indicates where an extra read cycle is signaled to keep the burst RAM driving the 
data bus for the last read. 


Figure 9-5. Burst Read-Write-Read L2 Cache Access (Pipelined) 


Figure 9-6 shows a burst read-modify-write memory access sequence when the L2 cache 
interface is configured with pipelined burst SRAM. 


SRAMCIk LI | | | 
L2CE | \ / \ / 


L2WE \ / 
burst rd burst rd < td modify we, > < burst wr 







































































SRAMAddress {ROX Ri X R2 X R3 X R4 X R5 X R6 X R7 X RB X Ry) W11 W13 
SRAMMemory 
SRAMData - 
Notes: 


Rary indicates where some burst RAMs may begin driving the data bus. 
Rt; indicates where an extra read cycle is signaled to keep the burst RAM driving the 
data bus for the last read. 


Figure 9-6. Burst Read-Modify-Write L2 Cache Access (Pipelined) 


Figure 9-7 shows a burst read-write-write memory access sequence when the L2 cache 
interface is configured with pipelined burst SRAM. 
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SRAMCIk LI | | 
L2CE ; \ / 1 \ / \ / 


L2WE 





burst rd 











aborted rd burst wr. 
« RO ) 


SRAMAddress 


SRAMMemory 



























































SRAMData 





Notes: 
Rary indicates where some burst RAMs may begin driving the data bus. 
Rt, indicates where an extra read cycle is signaled to keep the burst RAM driving the 
data bus for the last read. 


Figure 9-7. Burst Read-Write-Write L2 Cache Access (Pipelined) 


9.1.8.3 Late-Write SRAM 


Late-write SRAMs offer improved performance when compared to pipelined burst SRAMs 
by not requiring an extra read cycle during read operations, and requiring one cycle less 
when transitioning from a read to write operation. Late-write SRAMs implement an 
internal write queue, allowing write data to be provided one cycle after the write operation 
is signaled on the address and control buses. In this way write operations are queued on the 
address and data bus in the same way as read operations, allowing transitions between read 
and write operations to occur more efficiently. 


Figure 9-8 shows a burst read-write-read memory access sequence when the L2 cache 
interface is configured with late-write SRAM. 


SRAMOK ari fae r bin-abine isk be SACs 
Re ote See Sie Shae ed eg I 

































































L2CE 
L2WE | 
<=. burst rd——_> ~« burst wr > < burst rd > 
SRAMAddress — RO X Ri X R2 X R3 ) (W4 X W5 X W6 X W7 X RB X RO XRIOXR11 
SRAMMemory ¢ RO X_R1 X R2 X R3 ) (WO) W4 X Ws X W6 X RB X_R9XRIOXR11 
SRAMData 
Note: 


WQ is the last previous write that was queued in the late-write RAM. 


Figure 9-8. Burst Read-Write-Read L2 Cache Access (Late-Write SRAM) 


Figure 9-9 shows a burst read-modify-write memory access sequence when the L2 cache 
interface is configured with late-write SRAM. 
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sraAMCIK LOLOL A. 
L2CE ,\ / \ / 


L2WE Pf aa a ot ele a | a 
rd > 


burst rd burst rd > modify we—>> ~<——burst wr 
SRAMAddress 












SRAMMemory 



























































SRAMData 





Note: 
WQ is the last previous write that was queued in the late-write RAM. 


Figure 9-9. Burst Read-Modify-Write L2 Cache Access (Late-Write SRAM) 


Figure 9-10 shows a burst read-write-write memory access sequence when the L2 cache 
interface is configured with late-write SRAM. 


SRAMCIk LI | 
L2CE 1 \ 1/1 \ / \ / 


L2WE 














aborted rd 
SRAMAddress ( RO ) 















































SRAMMemory 

















SRAMData 


Note: 
W@Q is the last previous write that was queued in the late-write RAM. 


Figure 9-10. Burst Read-Write-Write L2 Cache Access (Late-Write SRAM) 
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Chapter 10 
Power and Thermal Management 


The MPC750 microprocessor is specifically designed for low-power operation. It provides 
both automatic and program-controlled power reduction modes for progressive reduction 
of power consumption. It also provides a thermal assist unit (TAU) to allow on-chip thermal 
measurement, allowing sophisticated thermal management for high-performance portable 
systems. This chapter describes the hardware support provided by the MPC750 for power 
and thermal management. Note that the MPC755 microprocessor is a derivative of the 
MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in 
Appendix C, “MPC755 Embedded G3 Microprocessor.” 


10.1 Dynamic Power Management 


Dynamic power management (DPM) automatically powers up and down the individual 
execution units of the MPC750, based upon the contents of the instruction stream. For 
example, if no floating-point instructions are being executed, the floating-point unit is 
automatically powered down. Power is not actually removed from the execution unit; 
instead, each execution unit has an independent clock input, which is automatically 
controlled on a clock-by-clock basis. Since CMOS circuits consume negligible power when 
they are not switching, stopping the clock to an execution unit effectively eliminates its 
power consumption. The operation of DPM is completely transparent to software or any 
external hardware. Dynamic power management is enabled by setting HIDO[DPM] to 1. 


10.2 Programmable Power Modes 


The MPC750 provides four programmable power states—full power, doze, nap, and sleep. 
Software selects these modes by setting one (and only one) of the three power saving mode 
bits in the HIDO register. Hardware can enable a power management state through external 
asynchronous interrupts. Such a hardware interrupt causes the transfer of program flow to 
interrupt handler code that then invokes the appropriate power saving mode. The MPC750 
provides a separate interrupt and interrupt vector for power management—the system 
management interrupt (SMI). The MPC750 also contains a decrementer which allows it to 
enter the nap or doze mode for a predetermined amount of time and then return to full power 
operation through a decrementer interrupt. Note that the MPC750 cannot switch from one 
power management mode to another without first returning to full-power mode. The sleep 
mode disables bus snooping; therefore, a hardware handshake is provided to ensure 
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coherency before the MPC750 enters this power management mode. Table 10-1 
summarizes the four power states. 


Table 10-1. MPC750 Microprocessor Programmable Power Modes 























PM Mode Functioning Units Activation Method Full-Power Wake Up Method 
Full power All units active — — 
Full power Requested logic by demand | By instruction dispatch — 
(with DPM) 
Doze ¢ Bus snooping Controlled by SW External asynchronous exceptions* 
« Data cache as needed Decrementer interrupt 
* Decrementer timer Performance monitor interrupt 
Thermal management interrupt 
Reset 
Nap ¢ Bus snooping Controlled by hardware External asynchronous exceptions 
— enabled by deassertion | and software Decrementer interrupt 
of QACK Performance monitor interrupt 
¢ Decrementer timer Thermal management interrupt 
Reset 
Sleep None Controlled by hardware External asynchronous exceptions 
and software Performance monitor interrupt 
Thermal management interrupt 
Reset 














Note: * Exceptions are referred to as interrupts in the architecture speci cation. 


10.2.1 Power Management Modes 


The following sections describe the characteristics of the MPC750’s power management 
modes, the requirements for entering and exiting the various modes, and the system 
capabilities provided by the MPC750 while the power management modes are active. 


10.2.1.1 Full-Power Mode with DPM Disabled 
Full-power mode with DPM disabled is selected when the DPM enable bit (bit 11) in HIDO 
is cleared. 

¢ Default state following power-up and HRESET 

e All functional units are operating at full processor speed at all times. 


10.2.1.2 Full-Power Mode with DPM Enabled 


Full-power mode with DPM enabled (HIDO[DPM] = 1) provides on-chip power 
management without affecting the functionality or performance of the MPC750. 

¢ Required functional units are operating at full processor speed. 

¢ Functional units are clocked only when needed. 

¢ No software or hardware intervention is required after mode is set. 

¢ Software/hardware and performance transparent 
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10.2.1.3 Doze Mode 


Doze mode disables most functional units but maintains cache coherency by enabling the 
bus interface unit and snooping. A snoop hit causes the MPC750 to enable the data cache, 
copy the data back to memory, disable the cache, and fully return to the doze state. 


¢ Most functional units disabled 

e Bus snooping and time base/decrementer still enabled 

¢ Doze mode sequence 
— Set doze bit (HIDO[8] = 1), clear nap and sleep bits (HIDO[9] and HIDO[10] = 0) 
— MPC750 enters doze mode after several processor clocks 

e Several methods of returning to full-power mode 





— Assert INT, SMI, MCP, decrementer, performance monitor, or thermal 
management interrupts 


— Assert hard reset or soft reset 
e Transition to full-power state takes no more than a few processor cycles 
e PLL running and locked to SYSCLK 


10.2.1.4 Nap Mode 


The nap mode disables the MPC750 but still maintains the phase-locked loop (PLL), delay 
locked loop (DLL), L2CLK_OUTA and L2CLK_OUTB output signals, and the time 
base/decrementer. The time base can be used to restore the MPC750 to the full-power state 
after a programmed amount of time. To maintain data coherency, bus snooping is disabled 
for nap and sleep modes through a hardware handshake sequence using the quiesce request 
(QREQ) and quiesce acknowledge (QACK) signals. The MPC750 asserts the QREQ signal 
to indicate that it is ready to disable bus snooping. When the system has ensured that 
snooping is no longer necessary, it will assert QACK and the MPC750 will enter the nap 
mode. If the system determines that a bus snoop cycle is required, QACK is negated to the 
MPC750 for at least eight bus clock cycles, and the MPC750 will then be able respond to 
a snoop cycle. Assertion of QACK following the snoop cycle will again disable the 
MPC750’s snoop capability. The MPC750’s power dissipation while in nap mode with 
QACK negated is the same as the power dissipation while in doze mode. 




















Note that when in nap mode the DLL should be kept locked to enable a quick recovery to 
full-power mode without having to wait for the DLL to re-lock. Additionally, an L2ZZ 
signal is provided by the MPC750’s L2 cache interface to drive external SRAM into a low 
power mode when the nap or sleep modes are invoked. The L2ZZ signal is enabled by 
setting the L2ZCR[CTL] bit to 1. Note that if bus snooping is to be performed through 
negation of the QACK signal, the L2CR[CTL] bit should always be cleared to 0. 


¢ Time base/decrementer still enabled 





¢ Most functional units disabled 
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e All nonessential input receivers disabled 
¢ Nap mode sequence 
— Set nap bit (HIDO[9] = 1), clear doze and sleep bits (HIDO[8] and HIDO[10] = 0) 
— MPC750 asserts quiesce request (QREQ) signal 
— System asserts quiesce acknowledge (QACK) signal 
— MPC750 enters sleep mode after several processor clocks 
¢ Nap mode bus snoop sequence 





— System deasserts QACK signal for eight or more bus clock cycles 
— MPC750 snoops address tenure(s) on bus 
— System asserts QACK signal to restore full nap mode 

e Several methods of returning to full-power mode 





— Assert INT, SMI, MCP, decrementer, performance monitor, or thermal 
management interrupts 


— Assert hard reset or soft reset 
e Transition to full-power takes no more than a few processor cycles 
e PLL and DLL running and locked to SYSCLK 


10.2.1.5 Sleep Mode 


Sleep mode consumes the least amount of power of the four modes since all functional units 
are disabled. To conserve the maximum amount of power, the PLL may be disabled by 
placing the PLL_CFG signals in the PLL bypass mode, and disabling SYSCLK. Note that 
forcing the SYSCLK signal into a static state does not disable the MPC750’s PLL, which 
will continue to operate internally at an undefined frequency unless placed in PLL bypass 
mode. Additionally, if the PLL is not disabled, the L2 cache interface DLL will remain 
locked and the L2CLK_OUTA and L2CLK_OUTB signals will remain active. The DLL is 
disabled by clearing the L2CR[L2E] bit to 0. 


Due to the fully static design of the MPC750, internal processor state is preserved when no 
internal clock is present. Because the time base and decrementer are disabled while the 
MPC750 is in sleep mode, the MPC750’s time base contents will have to be updated from 
an external time base after exiting sleep mode if maintaining an accurate time-of-day is 
required. Before entering the sleep mode, the MPC750 asserts the QREQ signal to indicate 
that it is ready to disable bus snooping. When the system has ensured that snooping is no 
longer necessary, it asserts QACK and the MPC750 will enter sleep mode. 








e All functional units disabled (including bus snooping and time base) 
¢ All nonessential input receivers disabled 

— Internal clock regenerators disabled 

— PLL and DLL still running (see below) 
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e Sleep mode sequence 
— Set sleep bit (HIDO[10] = 1), clear doze and nap bits (HIDO[8] and HIDO[9]) 
— MPC750 asserts quiesce request (QREQ) 
— System asserts quiesce acknowledge (QACK) 
— MPC750 enters sleep mode after several processor clocks 
e Several methods of returning to full-power mode 
— Assert INT, SMI, or MCP interrupts 
— Assert hard reset or soft reset 
e PLL and DLL may be disabled and SYSCLK may be removed while in sleep mode 
¢ Return to full-power mode after PLL and SYSCLK are disabled in sleep mode 
— Enable SYSCLK 
— Reconfigure PLL into desired processor clock mode 
— System logic waits for PLL startup and relock time (100 usec) 
— System logic asserts one of the sleep recovery signals (for example, INT or SMI) 


— Reconfigure DLL, wait for DLL relock (640 L2 clock cycles) and re-enable L2 
cache through the L2CR 


10.2.2 Power Management Software Considerations 


Since the MPC750 is a dual-issue processor with out-of-order execution capability, care 
must be taken in how the power management mode is entered. Furthermore, nap and sleep 
modes require all outstanding bus operations to be completed before these power 
management modes are entered. Normally, during system configuration time, one of the 
power management modes would be selected by setting the appropriate HIDO mode bit. 
Later on, the power management mode is invoked by setting the MSR[POW] bit. To ensure 
a clean transition into and out of a power management mode, set the MSR[EE] bit to 1 and 
execute the following code sequence: 


sync 
mtmsr[POW = 1] 
isync 


loop: b loop 


10.3 Thermal Assist Unit 


With the increasing power dissipation of high-performance processors and operating 
conditions that span a wider range of temperatures than desktop systems, thermal 
management becomes an essential part of system design to ensure reliable operation of 
portable systems. One key aspect of thermal management is ensuring that the junction 
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temperature of the microprocessor does not exceed the operating specification. While the 
case temperature can be measured with an external thermal sensor, the thermal constant 
from the junction to the case can be large, and accuracy can be a problem. This may lead to 
lower overall system performance due to the necessary compensation to alleviate 
measurement deficiencies. 


The MPC750 provides the system designer an efficient means of monitoring junction 
temperature through the incorporation of an on-chip thermal sensor and programmable 
control logic to enable a thermal management implementation tightly coupled to the 
processor for improved performance and reliability. 

10.3.1 Thermal Assist Unit Overview 


The on-chip thermal assist unit (TAU) is composed of a thermal sensor, a digital-to-analog 
converter (DAC), a comparator, control logic, and three dedicated SPRs. See Figure 10-1 
for a block diagram of the TAU. 


Thermal Sensor] 


<a 











Thermal Interrupt 
Request 
(0x1700) 


Interrupt Contro 





THRM3 
















Thermal Sensor 
Control Logic 























Figure 10-1. Thermal Assist Unit Block Diagram 














The TAU provides thermal control by periodically comparing the MPC750’s junction 
temperature against user-programmed thresholds, and generating a thermal management 
interrupt if the threshold values are crossed. The TAU also enables the user to determine the 
junction temperature through a software successive approximation routine. 


The TAU is controlled through three supervisor-level SPRs, accessed through the 
mtspr/mfspr instructions. Two of the SPRs (THRM1 and THRM2) provide temperature 
threshold values that can be compared to the junction temperature value, and control bits 
that enable comparison and thermal interrupt generation. The third SPR (THRMS3) provides 
a TAU enable bit and a sample interval timer. Note that all the bits in THRM1, THRM2, 
and THRM3 are cleared to 0 during a hard reset, and the TAU remains idle and in a 
low-power state until configured and enabled. 
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The bit fields in the THRM1 and THRM2 SPRs are described in Table 10-2. 


Table 10-2. THRM1 and THRM2 Bit Field Settings 





Bits 


Field 


Description 





TIN 


TIV 


Thermal management interrupt bit. Read only. This bit is set if the thermal sensor output crosses 
the threshold speci ed in the SPR. The state of this bit is valid only if TIV is set. The interpretation 
of the TIN bit is controlled by the TID bit. 


Thermal management interrupt valid. Read only. This bit is set by the thermal assist logic to 
indicate that the thermal management interrupt (TIN) state is valid. 





Threshold 


Threshold value that the output of the thermal sensor is compared to. The threshold range is 
between 0° and 127° C, and each bit represents 1° C. Note that this is not the resolution of the 
thermal sensor. 





9-28 


Reserved. System software should clear these bits to 0. 





29 


30 


TID 


TIE 


Thermal management interrupt direction bit. Selects the result of the temperature comparison to 
set TIN. If TID is cleared to 0, TIN is set and an interrupt occurs if the junction temperature 
exceeds the threshold. If TID is set to 1, TIN is set and an interrupt is indicated if the junction 
temperature is below the threshold. 


Thermal management interrupt enable. Enables assertion of the thermal management interrupt 
signal. The thermal management interrupt is maskable by the MSR[EE] bit. If TIE is cleared to 0 
and THRMnh is valid, the TIN bit records the status of the junction temperature vs. threshold 
comparison without asserting an interrupt signal. This feature allows system software to make a 
successive approximation to estimate the junction temperature. 





31 











SPR valid bit. This bit is set to indicate that the SPR contains a valid threshold, TID, and TIE 
controls bits. Setting THRM1/2[V] and THRM3[E] to 1 enables operation of the thermal sensor. 








The bit fields in the THRM3 SPR are described in Table 10-3. 


Table 10-3. THRM3 Bit Field Settings 





Bits 


Name 


Description 





0-17 


—_ Reserved for future use. System software should clear these bits to 0. 





18-30 


SITV_ | Sample interval timer value. Number of elapsed processor clock cycles before a junction temperature 


vs. threshold comparison result is sampled for TIN bit setting and interrupt generation. This is 
necessary due to the thermal sensor, DAC, and the analog comparator settling time being greater 
than the processor cycle time. The value should be con gured to allo w a sampling interval of 20 
microseconds. 





31 





E Enables the thermal sensor compare operation if either THRM1[V] or THRM2[V] is set to 1. 








10.3.2 Thermal Assist Unit Operation 


The TAU can be programmed to operate in single or dual threshold modes, which results in 
the TAU generating a thermal management interrupt when one or both threshold values are 
crossed. In addition, with the appropriate software routine, the TAU can also directly 
determine the junction temperature. The following sections describe the configuration of 
the TAU to support these modes of operation. 
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10.3.2.1 TAU Single Threshold Mode 


When the TAU is configured for single threshold mode, either THRM1 or THRM2 can be 
used to contain the threshold value, and a thermal management interrupt is generated when 
the threshold value is crossed. To configure the TAU for single threshold operation, set the 
desired temperature threshold, TID, TIE, and V bits for either THRM1 or THRM2. The 
unused THRMn threshold SPR should be disabled by clearing the V bit to 0. In this 
discussion THRMjn refers to the THRM threshold SPR (THRM1 or THRM2) selected to 
contain the active threshold value. 


After setting the desired operational parameters, the TAU is enabled by setting the 
THRM3[E] bit to 1, and placing a value allowing a sample interval of 20 microseconds or 
greater in the THRM3[SITV] field. The THRM3[SITV] setting determines the number of 
processor clock cycles between input to the DAC and sampling of the comparator output; 
accordingly, the use of a value smaller than recommended in the THRM3[SITV] field can 
cause inaccuracies in the sensed temperature. 


If the junction temperature does not cross the programmed threshold, the THRMn[TIN] bit 
is cleared to 0 to indicate that no interrupt is required, and the THRMn[TIV] bit is set to 1 
to indicate that the TIN bit state is valid. If the threshold value has been crossed, the 
THRMnz[TIN] and THRMn[TIV] bits are set to 1, and a thermal management interrupt is 
generated if both the THRMn[TIE] and MSR[EE] bits are set to 1. 


A thermal management interrupt is held asserted internally until recognized by the 
MPC750’s interrupt unit. Once a thermal management interrupt is recognized, further 
temperature sampling is suspended, and the THRMn[TIN] and THRMn[TIV] values are 
held until an mtspr instruction is executed to THRMn. 


The execution of an mtspr instruction to THRMn anytime during TAU operation will clear 
the THRMn[TIV] bit to 0 and restart the temperature comparison. Executing an mtspr 
instruction to THRM3 will clear both THRM1[TIV] and THRM2[TIV] bits to 0, and restart 
temperature comparison in THRMn if the THRM3[E] bit is set to 1. 


Examples of valid THRM1 and THRM2 bit settings are shown in Table 10-4. 
Table 10-4. Valid THRM1 and THRM2 Bit Settings 














TIN’ | TIv’ | TID | TIE V Description 
x x x x 0 The threshold in the SPR will not be used for comparison. 
x x x 0 1 Threshold is used for comparison, thermal management interrupt assertion is 
disabled. 
x x 0 0 1 Set TIN and do not assert thermal management interrupt if the junction 


temperature exceeds the threshold. 





x x 0 1 1 Set TIN and assert thermal management interrupt if the junction temperature 
exceeds the threshold. 





x x 1 0 1 Set TIN and do not assert thermal management interrupt if the junction 
temperature is less than the threshold. 
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Table 10-4. Valid THRM1 and THRM2 Bit Settings (continued) 




















TIN’ | TIV’ | TID V Description 

x x 1 1 Set TIN and assert thermal management interrupt if the junction temperature 
is less than the threshold. 

Xx 0 Xx 1 The state of the TIN bit is not valid. 

0 1 0 1 The junction temperature is less than the threshold and as a result the thermal 
management interrupt is not generated for TIE = 1. 

1 1 0 1 The junction temperature is greater than the threshold and as a result the 
thermal management interrupt is generated if TIE = 1. 

0 1 1 1 The junction temperature is greater than the threshold and as a result the 
thermal management interrupt is not generated for TIE = 1. 

1 1 1 x 1 The junction temperature is less than the threshold and as a result the thermal 
management interrupt is generated if TIE = 1. 























Note: ‘The TIN and TIV bits are read-only status bits. 


10.3.2.2 TAU Dual-Threshold Mode 


The configuration and operation of the TAU’s dual-threshold mode is similar to single 
threshold mode, except both THRM1 and THRM2 are configured with desired threshold 
and TID values, and the TIE and V bits are set to 1. When the THRM3[E] bit is set to 1 to 
enable temperature measurement and comparison, the first comparison is made with 
THRM1. If no thermal management interrupt results from the comparison, the number of 
processor cycles specified in THRM3[SITV] elapses, and the next comparison is made with 
THRM2. If no thermal management interrupt results from the THRM2 comparison, the 
time specified by THRM3[SITV] again elapses, and the comparison returns to THRM1. 


This sequence of comparisons continues until a thermal management interrupt occurs, or 
the TAU is disabled. When a comparison results in an interrupt, the comparison with the 
threshold SPR causing the interrupt is halted, but comparisons continue with the other 
threshold SPR. Following a thermal management interrupt, the interrupt service routine 
must read both THRM1 and THRM2 to determine which threshold was crossed. Note that 
it is possible for both threshold values to have been crossed, in which case the TAU ceases 
making temperature comparisons until an mtspr instruction is executed to one or both of 
the threshold SPRs. 


10.3.2.3 MPC750 Junction Temperature Determination 


While the MPC750’s TAU does not implement an analog-to-digital converter to enable the 
direct determination of the junction temperature, system software can execute a simple 
successive approximation routine to find the junction temperature. 


The TAU configuration used to approximate the junction temperature is the same required 
for single-threshold mode, except that the threshold SPR selected has its TIE bit cleared to 
0 to disable thermal management interrupt generation. Once the TAU is enabled, the 
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successive approximation routine loads a threshold value into the active threshold SPR, and 
then continuously polls the threshold SPRs TIV bit until it is set to 1, indicating a valid TIN 
bit. The successive approximation routine can then evaluate the TIN bit value, and then 
increment or decrement the threshold value for another comparison. This process is 
continued until the junction temperature is determined. 


10.3.2.4 Power Saving Modes and TAU Operation 


The static power saving modes provided by the MPC750 (the nap, doze, and sleep modes) 
allow the temperature of the processor to be lowered quickly, and can be invoked through 
the use of the TAU and associated thermal management interrupt. The TAU remains 
operational in the nap and doze modes, and in sleep mode as long as the SYSCLK signal 
input remains active. If the SYSCLK signal is made static when sleep mode is invoked, the 
TAU is rendered inactive. If the MPC750 is entering sleep mode with SYSCLK disabled, 
the TAU should be configured to disable thermal management interrupts to avoid an 
unwanted thermal management interrupt when the SYSCLK input signal is restored. 


10.4 Instruction Cache Throttling 


The MPC750 provides an instruction cache throttling mechanism to effectively reduce the 
instruction execution rate without the complexity and overhead of dynamic clock control. 
Instruction cache throttling, when used in conjunction with the TAU and the dynamic power 
management capability of the MPC750, provides the system designer with a flexible means 
of controlling device temperature while allowing the processor to continue operating. 


The instruction cache throttling mechanism simply reduces the instruction forwarding rate 
from the instruction cache to the instruction dispatcher. Normally, the instruction cache 
forwards four instructions to the instruction dispatcher every clock cycle if all the 
instructions hit in the cache. For thermal management the MPC750 provides a 
supervisor-level instruction cache throttling control (CTC) SPR. The instruction 
forwarding rate is reduced by writing a nonzero value into the ICTC[FI] field, and enabling 
instruction cache throttling by setting the ICTC[E] bit to 1. The overall junction 
temperature reduction results from dynamic power management reducing the power to the 
execution units while waiting for instructions to be forwarded from the instruction cache; 
thus, instruction cache throttling does not provide thermal reduction unless HIDO[DPM] is 
set to 1. Note that during instruction cache throttling the configuration of the PLL and DLL 
remain unchanged. 


The bit field settings of the ICTC SPR are shown in Table 10-5. 
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Table 10-5. ICTC Bit Field Settings 


Instruction Cache Throttling 





Bits 


Name 


Description 














23-30 FI Instruction forwarding interval expressed in processor clocks. 
0x00—0 clock cycle 
0x01—1 clock cycle 
OxFF—255 clock cycles 
31 E Cache throttling enable 


0 Disable instruction cache throttling. 
1 Enable instruction cache throttling. 
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Chapter 11 
Performance Monitor 


This chapter describes the performance monitor of the MPC750. Note that the MPC755 
microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply 
for the MPC755 except as noted in Appendix C, “MPC755 Embedded G3 
Microprocessor.” 


The performance monitor facility provides the ability to monitor and count predefined 
events such as processor clocks, misses in the instruction cache, data cache, or L2 cache, 
types of instructions dispatched, mispredicted branches, and other occurrences. The count 
of such events (which may be an approximation) can be used to trigger the performance 
monitor exception. The performance monitor facility is not defined by the PowerPC 
architecture. 


The performance monitor can be used for the following: 


¢ To increase system performance with efficient software, especially in a 
multiprocessing system. Memory hierarchy behavior may be monitored and studied 
in order to develop algorithms that schedule tasks (and perhaps partition them) and 
that structure and distribute data optimally. 

¢ To improve processor architecture, the detailed behavior of the MPC750’s structure 
must be known and understood in many software environments. Some environments 
may not be easily characterized by a benchmark or trace. 


¢ To help system developers bring up and debug their systems. 


The performance monitor uses the following MPC750-specific special-purpose registers 
(SPRs): 
e The performance monitor counter registers (PMC1—PMC4) are used to record the 
number of times a certain event has occurred. UPMC1—UPMC4 provide user-level 
read access to these registers. 


e The monitor mode control registers (MMCRO-MMCR1) are used to enable various 
performance monitor interrupt functions and select events to count. 
UMMCRO-UMMCR|I provide user-level read access to these registers. 

e The sampled instruction address register (SIA) contains the effective address of an 
instruction executing at or around the time that the processor signals the 


performance monitor interrupt condition. USIA provides user-level read access to 
the SIA. 
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Four 32-bit counters in the MPC750 count occurrences of software-selectable events. Two 
control registers (MMCRO and MMCRI1) are used to control performance monitor 
operation. The counters and the control registers are supervisor-level SPRs; however, in the 
MPC750, the contents of these registers can be read by user-level software using separate 
SPRs (UMMCRO and UMMCR1). Control fields in the MMCRO and MMCRI select the 
events to be counted, can enable a counter overflow to initiate a performance monitor 
exception, and specify the conditions under which counting is enabled. 


As with other PowerPC exceptions, the performance monitor interrupt follows the normal 
PowerPC exception model with a defined exception vector offset (OxOOFO0O). Its priority is 
below the external interrupt and above the decrementer interrupt. 


11.1. Performance Monitor Interrupt 


The performance monitor provides the ability to generate a performance monitor interrupt 
triggered by a counter overflow condition in one of the performance monitor counter 
registers (PMCI-—PMC4), shown in Figure 11-3. A counter is considered to have 
overflowed when its most-significant bit is set. A performance monitor interrupt may also 
be caused by the flipping from 0 to 1 of certain bits in the time base register, which provides 
a way to generate a time reference-based interrupt. 


Although the interrupt signal condition may occur with MSR[EE] = 0, the actual exception 
cannot be taken until MSR[EE] = 1. 


As a result of a performance monitor exception being signaled, the action taken depends on 
the type of event that caused the condition, which are as follows: 
¢ Threshold-related events—When a threshold event signals a performance monitor 
exception, the addresses of the instruction that caused the counter to overflow is 
saved in the SIA register. 
e Programmable events—To help track which part of the code was being executed 
when an exception was signaled, the address of the last completed instruction during 
that cycle is saved in the SIA. 


Exception handling for the performance monitor interrupt exception is described in 
Section 4.5.13, “Performance Monitor Interrupt (OxOOFO00).” 


11.2 Special-Purpose Registers Used by Performance 
Monitor 


The performance monitor incorporates the SPRs listed in Table 11-1. All of these 
supervisor-level registers are accessed through mtspr and mfspr instructions. The 
following table shows more information about all performance monitor SPRs. 
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Table 11-1. Performance Monitor SPRs 




















SPR Number spr[5-9] || spr[0-4] Register Name Access Level 
952 0b11101 11000 MMCRO Supervisor 
953 0b11101 11001 PMC1 Supervisor 
954 0b11101 11010 PMC2 Supervisor 
955 0b11101 11011 SIA Supervisor 
956 0b11101 11100 MMCR1 Supervisor 
957 0b11101 11101 PMC3 Supervisor 
958 0b11101 11110 PMC4 Supervisor 
936 0b11101 01000 UMMCRO User (read only) 
937 0b11101 01001 UPMC1 User (read only) 
938 0b11101 01010 UPMC2 User (read only) 
939 0b11101 01011 USIA User (read only) 
940 0b11101 01100 UMMCR1 User (read only) 
941 0b11101 01101 UPMC3 User (read only) 
942 0b11101 01110 UPMC4 User (read only) 


11.2.1 Performance Monitor Registers 


This section describes the registers used by the performance monitor. 


11.2.1.1 Monitor Mode Control Register 0 (MMCRO) 


The monitor mode control register 0 (MMCRO), shown in Figure 11-1, is a 32-bit SPR 
provided to specify events to be counted and recorded. MMCRO can be written to only in 
supervisor mode. User-level software can read the contents of MMCRO by issuing an 
mfspr instruction to UMMCR0O, described in Section 11.2.1.2, “User Monitor Mode 
Control Register 0 (UMMCRO).” 


INTONBITTRANS 
RTCSELECT 
DISCOUNT 
ENINT 


Ea ec el THRESHOLD edlialles PMC1SELECT PMC2SELECT 


4 5 6 8 9 10 15 16 17 18 19 25 26 








PMC2INTCONTROL 
PMC1INTCONTROL 





PMCTRIGGER 





Figure 11-1. Monitor Mode Control Register 0 (MMCRO) 


This register must be cleared at power up. Reading this register does not change its 
contents. Table 11-2 describes the bits of the MMCRO register. 
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Table 11-2. MMCRO Bit Settings 





























Bits Name Description 
0 DIS Disables counting unconditionally. 
0 The values of the PMCrn counters can be changed by hardware. 
1 The values of the PMCn counters cannot be changed by hardware. 
1 DP Disables counting while in supervisor mode. 
0 The PMCn counters can be changed by hardware. 
1 Ifthe processor is in supervisor mode (MSR[PR] is cleared), the counters are not 
changed by hardware. 
2 DU Disables counting while in user mode. 
0 The PMCn counters can be changed by hardware. 
1 Ifthe processor is in user mode (MSR[PR] is set), the PMCn counters are not 
changed by hardware. 
3 DMS Disables counting while MSR[PM] is set. 
0 The PMCn counters can be changed by hardware. 
1 If MSR[PM] is set, the PMCn counters are not changed by hardware. 
4 DMR Disables counting while MSR[PM] is zero. 
0 The PMCn counters can be changed by hardware. 
1 If MSR[PM] is cleared, the PMCn counters are not changed by hardware. 
5 ENINT Enables performance monitor interrupt signaling. 
0 Interrupt signaling is disabled. 
1 Interrupt signaling is enabled. 
Cleared by hardware when a performance monitor interrupt is signaled. To re-enable 
these interrupt signals, software must set this bit after servicing the performance 
monitor interrupt. The IPL ROM code clears this bit before passing control to the 
operating system. 
6 DISCOUNT Disables counting of PMCn when a performance monitor interrupt is signaled (that is, 
((PMCnINTCONTROL = 1) & (PMCn[0] = 1) & (ENINT = 1)) or the occurrence of an 
enabled time base transition with ((INTONBITTRANS =1) & (ENINT = 1)). 
0 Signaling a performance monitor interrupt does not affect counting status of PMCn. 
1 The signaling of a performance monitor interrupt prevents changing of PMC1 
counter. The PMCn counter does not change if PMC2COUNTCTL = 0. 
Because a time base signal could have occurred along with an enabled counter 
over 0 w condition, software should always reset INTONBITTRANS to zero, if the value 
in INTONBITTRANS was a one. 
7-8 RTCSELECT 64-bit time base, bit selection enable 
00 Pick bit 63 to count 
01 Pick bit 55 to count 
10 Pick bit 51 to count 
11 Pick bit 47 to count 
9 INTONBITTRANS | Causes interrupt signaling on bit transition (identi ed in RTCSELECT) from off to on. 
0 Do not allow interrupt signal on the transition of a chosen bit. 
1 Signal interrupt on the transition of a chosen bit. 
Software is responsible for setting and clearing INTONBITTRANS. 

10-15 THRESHOLD Threshold value. All 6 bits are supported by the MPC750; allowing threshold values 
from 0 to 63. The intent of the THRESHOLD support is to characterize L1 data cache 
misses. 

16 |PMC1INTCONTROL | Enables interrupt signaling due to PMC1 counter over o w. 








0 Disable PMC1 interrupt signaling due to PMC1 counter over ow. 
1 Enable PMC1 Interrupt signaling due to PMC1 counter over o w. 
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Table 11-2. MMCRO Bit Settings (continued) 


Bits Name Description 





17 PMCINTCONTROL | Enable interrupt signaling due to any PMC2—PMC4 counter over o w. Overrides the 
setting of DISCOUNT. 

0 Disable PMC2—PMC4 interrupt signaling due to PMC2—PMC4 counter over 0 w. 
1 Enable PMC2—PMC4 interrupt signaling due to PMC2—PMC4 counter over o w. 





18 PMCTRIGGER Can be used to trigger counting of PMC2—PMC4 after PMC1 has over o wed or after a 

performance monitor interrupt is signaled. 

0 Enable PMC2—PMC4 counting. 

1 Disable PMC2—PMC4 counting until either PMC1[0] = 1 or a performance monitor 
interrupt is signaled. 





19-25 PMC1SELECT PMC‘1 input selector, 128 events selectable; 25 de ned. See Table 11-5. 














26-31 PMC2SELECT PMC2 input selector, 64 events selectable; 21 de ned. See Table 11-6. 





MMCRO can be accessed with the mtspr and mfspr instructions using SPR 952. 


11.2.1.2 User Monitor Mode Control Register 0 (UMMCRO) 


The contents of MMCRO are reflected to UMMCRO, which can be read by user-level 
software. UMMCRO can be accessed with the mfspr instructions using SPR 936. 


11.2.1.3| Monitor Mode Control Register 1 (MMCR1) 


The monitor mode control register 1 (MMCR1) functions as an event selector for 
performance monitor counter registers 3 and 4 (PMC3 and PMC4). The MMCR1 register 


is shown in Figure 11-2. 


||| Reserved 
PMC3SELECT] PMC4SELECT 00 0000 0000 0000 0000 0000 
0 45 910 31 


Figure 11-2. Monitor Mode Control Register 1 (MMCR1) 


Bit settings for MMCR1 are shown in Table 11-3. The corresponding events are described 
in Section 11.2.1.5, “Performance Monitor Counter Registers (PMC1—PMC4).” 


Table 11-3. MMCR1 Bit Settings 





Bits Name Description 





0-4 | PMC3SELECT PMC3 input selector. 32 events selectable. See Table 11-7 for de ned selections . 





5-9 |PMC4SELECT PMC4 input selector. 32 events selectable. See Table 11-8 for de ned selections . 








10-31 |— Reserved 














MMCRI1 can be accessed with the mtspr and mfspr instructions using SPR 956. User-level 
software can read the contents of MMCR1 by issuing an mfspr instruction to UMMCRI, 
described in Section 11.2.1.4, “User Monitor Mode Control Register 1 (UMMCR1).” 
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11.2.1.4 User Monitor Mode Control Register 1 (UMMCR1) 


The contents of MMCRI are reflected to UMMCRI1, which can be read by user-level 
software. UMMCRI can be accessed with the mfspr instructions using SPR 940. 


11.2.1.5 Performance Monitor Counter Registers (PMC1—PMC4) 


PMC1-PMC4, shown in Figure 11-3, are 32-bit counters that can be programmed to 
generate interrupt signals when they overflow. 


Counter Value 


OA 31 
Figure 11-3. Performance Monitor Counter Registers (PMC1—PMC4) 


The bits contained in the PMC registers are described in Table 11-4. 
Table 11-4. PMCn Bit Settings 





Bits Name Description 


| o jo | Over o w. When this bit is set, it indicates this counter has reached its maximum value. 





| 1-31 | Counter value | Indicates the number of occurrences of the speci ed e vent. 


Counters overflow when the high-order bit (the sign bit) becomes set; that is, they reach the 
value 2147483648 (Ox8000_0000). However, an interrupt is not signaled unless both 
MMCRO[ENINT] and either PMCIINTCONTROL or PMCINTCONTROL in the 
MMCRO register are also set as appropriate. 


Note that the interrupts can be masked by clearing MSR[EE]; the interrupt signal condition 
may occur with MSR[EE] cleared, but the exception is not taken until MSR[EE] is set. 
Setting MMCRO[DISCOUNT] forces counters to stop counting when a counter interrupt 
occurs. 


Software is expected to use the mtspr instruction to explicitly set PMC to non-overflowed 
values. Setting an overflowed value may cause an erroneous exception. For example, if both 
MMCRO[ENINT] and either PMCIINTCONTROL or PMCINTCONTROL are set and the 
mtspr instruction loads an overflow value, an interrupt signal may be generated without an 
event counting having taken place. 


The event to be monitored can be chosen by setting MMCRO[19-31]. The selected events 
are counted beginning when MMCRO is set until either MMCRO is reset or a performance 
monitor interrupt is generated. Table 11-5 lists the selectable events and their encodings. 
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Table 11-5. PMC1 Events—MMCRO[19-25] Select Encodings 






































Encoding Description 

000 0000 | Register holds current value. 

000 0001 | Number of processor cycles 

000 0010 | Number of instructions that have completed. Does not include folded branches. 

0000011 | Number of transitions from 0 to 1 of speci ed bits in time base lo wer register. Bits are speci ed through 
RTCSELECT, MMRCO[7-8]. 00 = 15, 01 = 19, 10 = 23, 11 = 31 

0000100 | Number of instructions dispatched—4O, 1, or 2 instructions per cycle 

0000101 | Number of eieio instructions completed 

0000110 |Number of cycles spent performing table search operations for the ITLB 

0000111 |Number of accesses that hit the L2 

0001000 | Number of valid instruction EAs delivered to the memory subsystem 

0001001 |Number of times the address of an instruction being completed matches the address in the IABR 

0001010 | Number of loads that miss the L1 with latencies that exceeded the threshold value 

0001011 |Number of branches that are unresolved when processed 

0001100 | Number of cycles the dispatcher stalls due to a second unresolved branch in the instruction stream 

All others | Reserved. May be used in a later revision. 








Bits MMCRO[26-31] specify events associated with PMC2, as shown in Table 11-6. 


Table 11-6. PMC2 Events—MMCRO[26-31] Select Encodings 



































Encoding Description 
00 0000 | Register holds current value. 
000001 | Counts processor cycles. 
000010 | Counts completed instructions. Does not include folded branches. 
000011 | Counts transitions from 0 to 1 of TBL bits speci ed through MMRCO[R TCSELECT]. 00 = 47, 01 = 51, 10 
= 55, 11 = 63. 
000100 /| Counts instructions dispatched. 0, 1, or 2 instructions per cycle. 
000101 | Counts L1 instruction cache misses. 
000110 /|Counts ITLB misses. 
000111 | Counts L2 instruction misses. 
001000 |Counts branches predicted or resolved not taken. 
00 1001 | Counts MSR[PR] bit toggles. 
001010 |Counts times reserved load operations completed. 
001011 |Counts completed load and store instructions. 
001100 /|Counts snoops to the L1 and the L2. 
001101 |Counts L1 cast-outs to the L2. 
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Table 11-6. PMC2 Events—MMCRO[26-31] Select Encodings (continued) 














Encoding Description 

001110 |Counts completed system unit instructions. 

001111 | Counts instruction fetch misses in the L1. 

010000 /| Counts branches allowing out-of-order execution that resolved correctly. 
All others | Reserved. 











Bits MMCR1[0-4] specify events associated with PMC3, as shown in Table 11-7. 


Table 11-7. PMC3 Events—MMCR1[0-4] Select Encodings 





















































Encoding Description 

0 0000 Register holds current value. 

00001 Number of processor cycles 

0 0010 Number of completed instructions, not including folded branches. 

00011 Number of TBL bit transitions from 0 to 1 of speci ed bits in time base lo wer register. Bits are speci ed 

through RTCSELECT (MMRCO[7-8)}). 0 = 47, 1 = 51, 2 = 55, 3 = 63. 

0 0100 Number of instructions dispatched. 0, 1, or 2 per cycle. 

00101 Number of L1 data cache misses 

00110 Number of DTLB misses 

00111 Number of L2 data misses 

0 1000 Number of taken branches, including predicted branches. 

0 1001 Number of transitions between marked and unmarked processes while in user mode. That is, the 

number of MSR[PM] toggles while the processor is in user mode. 

0 1010 Number of store conditional instructions completed 

0 1011 Number of instructions completed from the FPU 

0 1100 Number of L2 castouts caused by snoops to modi ed lines 

01101 Number of cache operations that hit in the L2 cache 

01110 Reserved 

01111 Number of cycles generated by L1 load misses 

1 0000 Number of branches in the second speculative stream that resolve correctly 

10001 Number of cycles the BPU stalls due to LR or CR unresolved dependencies 
All others | Reserved. May be used in a later revision. 





Bits MMCRI1[5-9] specify events associated with PMC4, as shown in Table 11-8. 
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Table 11-8. PMC4 Events—MMCR1[5-9] Select Encodings 















































Encoding Comments 
00000 Register holds current value 
00001 Number of processor cycles 
00010 Number of completed instructions, not including folded branches 
00011 Number of TBL bit transitions from 0 to 1 of speci ed bits in time-base lo wer register. Bits are speci ed 
through RTCSELECT (MMRCO[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 = 63. 
00100 Number of instructions dispatched. 0, 1, or 2 per cycle 
00101 Number of L2 castouts 
00110 Number of cycles spent performing table searches for DTLB accesses. 
00111 Reserved. May be used in a later revision. 
01000 Number of mispredicted branches 
01001 Number of transitions between marked and unmarked processes while in user mode. That is, the number 
of MSR[PM] toggles while the processor is in supervisor mode. 
01010 Number of store conditional instructions completed with reservation intact 
01011 Number of completed sync instructions 
01100 Number of snoop request retries 
01101 Number of completed integer operations 
01110 Number of cycles the BPU cannot process new branches due to having two unresolved branches 
All others | Reserved. May be used in a later revision. 





The PMC registers can be accessed with the mtspr and mfspr instructions using the 
following SPR numbers: 


¢ PMC1 is SPR 953 
¢ PMC2 is SPR 954 
¢ PMC3 is SPR 957 
¢ PMC4 is SPR 958 


11.2.1.6 User Performance Monitor Counter Registers 


(UPMC1-UPMC4) 


The contents of the PMC1—PMC4 are reflected to UPMC1—UPMC4, which can be read by 
user-level software. The UPMC registers can be read with the mfspr instructions using the 
following SPR numbers: 


¢ UPMC1 is SPR 937 
¢ UPMC2 is SPR 938 
¢ UPMC3 is SPR 941 
¢ UPMC4 is SPR 942 
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11.2.1.7 Sampled Instruction Address Register (SIA) 


The sampled instruction address register (SIA) is a supervisor-level register that contains 
the effective address of an instruction executing at or around the time that the processor 
signals the performance monitor interrupt condition. The SIA is shown in Figure 11-4. 


Instruction Address 


0 31 
Figure 11-4. Sampled instruction Address Registers (SIA) 


If the performance monitor interrupt is triggered by a threshold event, the SIA contains the 
address of the exact instruction (called the sampled instruction) that caused the counter to 
overflow. 


If the performance monitor interrupt was caused by something besides a threshold event, 
the SIA contains the address of the last instruction completed during that cycle. SIA can be 
accessed with the mtspr and mfspr instructions using SPR 955. 


11.2.1.8 User Sampled Instruction Address Register (USIA) 


The contents of SIA are reflected to USIA, which can be read by user-level software. USIA 
can be accessed with the mfspr instructions using SPR 939. 


11.3 Event Counting 


Counting can be enabled if conditions in the processor state match a software-specified 
condition. Because a software task scheduler may switch a processor’s execution among 
multiple processes and because statistics on only a particular process may be of interest, a 
facility is provided to mark a process. The performance monitor (PM) bit, MSR[29] is used 
for this purpose. System software may set this bit when a marked process is running. This 
enables statistics to be gathered only during the execution of the marked process. The states 
of MSR[PR] and MSR[PM] together define a state that the processor (supervisor or 
program) and the process (marked or unmarked) may be in at any time. If this state matches 
a state specified by the MMCR, the state for which monitoring is enabled, counting is 
enabled. 


The following are states that can be monitored: 
e (Supervisor) only 
e (User) only 
¢ (Marked and user) only 
¢ (Not marked and user) only 
¢ (Marked and supervisor) only 





e (Not marked and supervisor) only 
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(Marked) only 
(Not marked) only 


In addition, one of two unconditional counting modes may be specified: 


Counting is unconditionally enabled regardless of the states of MSR[PM] and 
MSR[PR]. This can be accomplished by clearing MMCRO[0-4]. 


Counting is unconditionally disabled regardless of the states of MSR[PM] and 
MSR[PR]. This is done by setting MMCRO[0]. 


The performance monitor counters count specified events and are used to generate 
performance monitor exceptions when an overflow (most-significant bit is a 1) situation 
occurs. The MPC750 performance monitor has four, 32-bit registers that can count up to 
Ox7FFFFFFF (2,147,483,648 in decimal) before overflowing. Bit 0 of the registers is used 
to determine when an interrupt condition exists. 


11.4 Event Selection 


Event selection is handled through MMCRO and MMCRI, described in Table 11-2 and 
Table 11-3, respectively. Event selection is described as follows: 


The four event-select fields in MMCRO and MMCR1 are as follows: 


— MMCRO[19-25] PMC1SELECT—PMC1 input selector, 128 events selectable; 
25 defined. See Table 11-5. 

— MMCRO[26-31] PMC2SELECT—PMC72 input selector, 64 events selectable; 
21 defined. See Table 11-6. 

— MMCRO[0-4] PMC3SELECT—PMC3 input selector. 32 events selectable, 
defined. See Table 11-7. 


— MMCRO[5-9] PMC4SELECT—PMC4 input selector. 32 events selectable. See 
Table 11-8. 

In the tables, a correlation is established between each counter, events to be traced, 

and the pattern required for the desired selection. 








The first five events are common to all four counters and are considered to be 
reference events. These are as follows: 


— (00000—Register holds current value 
— 00001—Number of processor cycles 
— 00010—Number of completed instructions, not including folded branches 


— 00011—Number of TBL bit transitions from 0 to 1 of specified bits in time base 
lower register. Bits are specified through RTCSELECT (MMCRO[7-8]). 0 = 47, 
Lao 2 55,0 = 63: 


— 00100—Number of instructions dispatched. 0, 1, or 2 per cycle 
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e Some events can have multiple occurrences per cycle, and therefore need two or 
three bits to represent them. 


11.5 Warnings 


The following warnings should be noted: 


e Only those load and store in queue position 0 of their respective load/store queues 
are monitored when a threshold event is selected in PMC1. 


¢ The MPC750 cannot accurately track threshold events with respect to the following 
types of loads and stores: 


— Unaligned load and store operations that cross a word boundary 
— Load and store multiple operations 
— Load and store string operations 
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Appendix A 
PowerPC Instruction Set Listings 


This appendix lists the MPC750 microprocessor’s instruction set as well as the additional 
PowerPC instructions not implemented in the MPC750. Instructions are sorted by 
mnemonic, opcode, function, and form. Also included in this appendix is a quick reference 
table that contains general information, such as the architecture level, privilege level, and 
form, and indicates if the instruction is 64-bit and optional. Note that the MPC750 is a 
32-bit microprocessor, and doesn’t implement any 64-bit instructions. 


Note that split fields, that represent the concatenation of sequences from left to right, are 
shown in lowercase. For more information refer to Chapter 8, “Instruction Set,” in the 
Programming Environments Manual. 


A.1_ Instructions Sorted by Mnemonic 


Table A-1 lists the instructions implemented in the PowerPC architecture in alphabetical 
order by mnemonic. 
Key: 


Reserved bits 
Table A-1. Complete Instruction List Sorted by Mnemonic 












































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx 31 D B E 266 Re 
addcx 31 D A B E 10 Re 
addex 31 D A B E 138 Re 
addi 14 D A SIMM 
addic 12 D A SIMM 
addic. 13 D A SIMM 
addis 15 D A SIMM 
addmex 31 D A 00000 E 234 Re 
addzex 31 D A 00000 E 202 Re 
andx 31 S A B 28 Re 
andcx 31 S A B 60 Re 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
andi. 28 S A UIMM 
andis. 29 S A UIMM 
bx 18 LI AA|LK 
bex 16 BO BI BD AA|LK 
bectrx 19 BO BI 00000 528 LK 
belrx 19 BO BI 00000 16 LK 
cmp 31 criD /|O/L A B 0 0 
cmpi 11 crfD |O]L A SIMM 
cmpl 31 criD |O/L A B 32 0 
cmpli 10 crfD |O]L A UIMM 
cntlzwx 31 ‘S) A 00000 26 Re 
crand 19 crbD crbA crbB 257 0 
crandc 19 crbD crbA crbB 129 0 
creqv 19 crbD crbA crbB 289 0 
crnand 19 crbD crbA crbB 225 0 
crnor 19 crbD crbA crbB 33 0 
cror 19 crbD crbA crbB 449 0 
crorc 19 crbD crbA crbB 417 0 
crxor 19 crbD crbA crbB 193 0 
dcba 17 31 00000 B 758 0 
dcbf 31 00000 A B 86 0 
debi 2 a1 00000 A B 470 0 
dcbst 31 00000 A B 54 0 
dcbt 31 00000 A B 278 0 
debtst 31 00000 A B 246 0 
dcbz 31 00000 A B 1014 0 
divwx 31 D A B OE 491 Re 
divwux 31 D A B OE 459 Re 
eciwx 31 D A B 310 0 
ecowx 31 S A B 438 0 
eieio 31 00000 00000 00000 854 0 
eqvx 31 S A B 284 Re 
extsbx 31 S) A 00000 954 Re 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
extshx 31 Ss A 00000 922 Re 
fabsx 63 D 00000 B 264 Re 
faddx 63 D A B 00000 21 Re 
faddsx 59 D A B 00000 21 Re 
fempo 63 criD | 00 A B 32 0 
fempu 63 criD | 00 A B 0 0 
fctiwx 63 D 00000 B 14 Re 
fetiwzx 63 D 00000 B 15 Re 
fdivx 63 D A B 00000 18 Re 
fdivsx 59 D A B 00000 18 Re 
fmaddx 63 D A B c 29 Re 
fmaddsx 59 D A B Cc 29 Re 
fmrx 63 D 00000 B 72 Re 
fmsubx 63 D A B C 28 Re 
fmsubsx 59 D A B Cc 28 Re 
fmulx 63 D A 00000 re 25 Re 
fmulsx 59 D A 00000 c 25 Re 
fnabsx 63 D 00000 B 136 Re 
fnegx 63 D 00000 B 40 Re 
fnmaddx 63 D A B c 31 Re 
fnmaddsx 59 D A B C 31 Re 
fnmsubx 63 D A B CG 30 Re 
fnmsubsx 59 D A B C 30 Re 
fresx ' 59 D 00000 B 00000 24 Re 
frspx 63 D 00000 B 12 Re 
frsqrtex ' 63 D 00000 B 00000 26 Re 
fselx ' 63 D A B C 23 Re 
fsqrtx 1) 63 D 00000 B 00000 22 Re 
fsqrtsx 17 59 D 00000 B 00000 22 Re 
fsubx 63 D A B 00000 20 Re 
fsubsx 59 D A B 00000 20 Re 
icbi 31 00000 A B 982 0 
isync 19 00000 00000 00000 150 0 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Ibz 34 A 

Ibzu 35 
Ibzux 31 B 119 0 
Ibzx 31 B 87 0 

lfd 50 

lfdu 51 
lfdux 31 B 631 0 




























































































D 

D A 

D A 

D A 

D A 

D A 

D A 
Ifdx 31 D A B 599 0 

Ifs 48 D A 

Ifsu 49 D A 
Ifsux 31 D A B 567 0 
lfsx 31 D A B 535 0 

lha 42 D A 

lhau 43 D A 
lhaux 31 D A B 375 0 
lhax 31 D A B 343 0 
Ihbrx 31 D A B 790 0 

Ihz 40 D A 

Ihzu 41 D A 
Ihzux 31 D A B 311 0 
Ihzx 31 D A B 279 0 

Imw 2 46 D A 
Iswi 3 31 D A NB 597 0 
Iswx ? 31 D A B 533 0 
lwarx 31 D A B 20 0 
Ilwbrx 31 D A B 534 0 

lwz 32 D A 

lwzu 33 D A 
lwzux 31 D A B 55 0 
lwzx 31 D A B 23 0 
mert 19 crfD | 00]| crfS | 00 00000 0 0 
merfs 63 crfD | 00] crfS | 00 00000 64 0 
merxr 31 crfD | 00 00000 00000 512 0 
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mfcr 
mffsx 
mfmsr 2 
mfspr 4 
mfsr 2 
mfsrin 2 
mftb 
mtcrf 
mtfsb0x 
mtfsb1x 
mtfsfx 
mtfsfix 
mtmsr 2 
mtspr 4 
mtsr 2 
mtsrin 2 
mulhwx 
mulhwux 
mulli 
mullwx 
nandx 
negx 
norx 
orx 
orcx 

ori 

oris 

rfi 
rlwimix 
rlwinmx 
rlwnmx 
sc 


slwx 


0 
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5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 


Instructions Sorted by Mnemonic 






































































































































31 D 00000 00000 19 0 
63 D 00000 00000 583 Re 
31 D 00000 00000 83 0 
31 D spr 339 0 
31 D 0 SR 00000 595 0 
31 D 00000 B 659 0 
31 D tbr 371 0 
31 Ss 0 CRM 144 0 
63 crbD 00000 00000 70 Re 
63 crbD 00000 00000 38 Re 
63 0 FM 0 B 711 Re 
63 crfD 00 00000 IMM 134 Re 
31 S) 00000 00000 146 0 
31 S spr 467 0 
31 S 0 SR 00000 210 0 
31 .S) 00000 B 242 0 
31 D A B 75 Re 
31 D A B 11 Re 
7 D A SIMM 

31 D A B 235 Re 
31 cS) A B 476 Re 
31 D A 00000 104 Re 
31 S A B 124 Re 
31 S A B 444 Re 
31 S A B 412 Re 
24 S A UIMM 

25 S A UIMM 

19 00000 00000 00000 50 0 
20 S A SH MB ME Re 
21 S A SH MB ME Re 
23 S A B MB ME Re 
17 00000 00000 00000000000000 0 
31 S A B 24 Re 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
srawx 31 S A B 792 Rc 
srawix 31 Ss A SH 824 Re 

srwx 31 Ss A B 536 Rc 
stb 38 Ss A 
stbu 39 Ss A 
stbux 31 Ss A B 247 0 
stbx 31 iS A B 215 0 
stfd 54 iS A 
stfdu 55 iS A 
stfdux 31 Ss A B 759 0 
stfdx 31 Ss A B 727 0 
stfiwx 31 Ss A B 983 0 
stfs 52 Ss A 
stfsu 53 Ss A 
stfsux 31 Ss A B 695 0 
stisx 31 iS A B 663 0 
sth 44 iS A 
sthbrx 31 Ss A B 918 0 
sthu 45 Ss A 
sthux 31 Ss A B 439 0 
sthx 31 iS A B 407 0 
stmw 2 47 Ss A 
stswi ° 31 Ss A NB 725 0 
stswx 3 31 iS A B 661 0 
stw 36 Ss A 
stwbrx 31 iS A B 662 0 
stwex. 31 Ss A B 150 1 
stwu 37 Ss A 
stwux 31 Ss A B 183 0 
stwx 31 Ss A B 151 0 
subfx 31 D A B 40 Re 
subfcx 31 D A B 8 Re 
subfex 31 D A B 136 Re 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
subfic 08 D A SIMM 
subfmex 31 D A 00000 E 232 Re 
subtzex 31 D A 00000 E 200 Re 
sync 31 00000 00000 00000 598 0 
tlbia 13:7 31 00000 00000 00000 370 0 
tlbie 1 31 00000 00000 B 306 0 
tlbsyne'3 31 00000 00000 00000 566 0 
tw 31 TO A B 4 0 
twi 03 TO A SIMM 
xorx 31 Ss A B 316 Re 
xori 26 Ss A UIMM 
xoris 27 Ss A UIMM 
Notes: 


1 Optional instruction 

2 Supervisor-level instruction 

3 Load/store string/multiple instruction 
4 Supervisor- and user-level instruction 


A.2 Instructions Sorted by Opcode 


Table A-2 lists the instructions defined in the PowerPC architecture in numeric order by 
































opcode 
Key: 
Reserved bits 
Table A-2. Complete Instruction List Sorted by Opcode 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

twi} 000011 TO A SIMM 
mulli} 000111 D A SIMM 
subfic| 001000 D A SIMM 
cmpli| 001010 cerfD |OJL A UIMM 
cmpi| 001011 crfD |OJL A SIMM 
addic] 001100 D A SIMM 
addic.| 001101 D A SIMM 
addi} 001110 D A SIMM 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addis 001111 D A SIMM 
bex 010000 BO Bl BD AAILK 
sc 010001 00000 00000 000000000000000 1/0 
bx 010010 LI AAILK 
merf 010011 crfD 00 crfS 00 00000 0000000000 0 
belrx 010011 BO Bl 00000 0000010000 LK 
crnor 010011 crbD crbA crbB 0000100001 0 
rfi 1 010011 00000 00000 00000 0000110010 0 
crandc 010011 crbD crbA crbB 0010000001 0 
isyne 010011 00000 00000 00000 0010010110 0 
crxor 010011 crbD crbA crbB 0011000001 0 
crnand 010011 crbD crbA crbB 0011100001 0 
crand 010011 crbD crbA crbB 0100000001 0 
creqv 010011 crbD crbA crbB 0100100001 0 
crorc 010011 crbD crbA crbB 0110100001 0 
cror 010011 crbD crbA crbB 0111000001 0 
bectrx 010011 BO BI 00000 1000010000 LK 
rlwimix 010100 S A SH MB ME Re 
rlwinmx 010101 S A SH MB ME Re 
rlwnmx 010111 S A B MB ME Re 
ori 011000 S A UIMM 
oris 011001 S A UIMM 
xori 011010 S A UIMM 
xoris 011011 S A UIMM 
andi. 011100 S A UIMM 
andis. 011101 S A UIMM 
cmp 011111 criD |O/L A B 0000000000 0 
tw 011111 TO A B 0000000100 0 
subfcx 011111 D A B OE 0000001000 Re 
addcx 011111 D A B OE 0000001010 Re 
mulhwux 011111 D A B 0 0000001011 Re 
mfcr 011111 D 00000 00000 0000010011 0 
lwarx 011111 D A B 0000010100 0 























MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 





Name 


lwzx 
slwx 
entlzwx 
andx 
cmpl 
subfx 
dcbst 
lwzux 
andcx 
mulhwx 
mfmsr 1 
dcbf 
Ibzx 
negx 
Ibzux 
norx 
subfex 
addex 
mtcrf 
mtmsr 1 
stwex. 
stwx 
stwux 
subfzex 
addzex 
mtsr 1 
stbx 
subfmex 
addmex 
mullwx 
mtsrin 1 
debtst 


stbux 


0 


Freescale Semiconductor, Inc. 
Instructions Sorted by Opcode 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 






















































































011111 D A B 0000010111 0 
011111 S) A B 0000011000 Re 
011111 S A 00000 0000011010 Re 
011111 S A B 0000011100 Re 
011111 criD |0 A B 0000100000 0 
011111 D A B OE 0000101000 Re 
011111 00000 A B 0000110110 0 
011111 D A B 0000110111 0 
011111 S A B 0000111100 Re 
011111 D A B 0 0001001011 Re 
011111 D 00000 00000 0001010011 0 
011111 00000 A B 0001010110 0 
011111 D A B 0001010111 0 
011111 D A 00000 (OF 0001101000 Re 
011111 D A B 0001110111 0 
011111 S A B 0001111100 Re 
011111 D A B OE 0010001000 Re 
011111 D A B OE 0010001010 Re 
011111 S 0 CRM 0 0010010000 0 
011111 S 00000 00000 0010010010 0 
011111 S A B 0010010110 1 
011111 S A B 0010010111 0 
011111 S A B 0010110111 0 
011111 D A 00000 (EF 0011001000 Re 
011111 D A 00000 (OF 0011001010 Re 
S) 


011111 0 SR 00000 0011010010 0 









































011111 S A B 0011010111 0 
011111 D A 00000 (OF 0011101000 Re 
011111 D A 00000 (OE 0011101010 Re 
011111 D A B OE 0011101011 Re 
011111 S 00000 B 0011110010 0 
011111 00000 A B 0011110110 0 
011111 S A B 0011110111 0 
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Freescale Semiconductor, Inc. 


Instructions Sorted by Opcode 


Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx} 011111 D A B OE 0100001010 Re 
debt 011111 00000 A B 0100010110 0 
Ihzx 011111 D A B 0100010111 0 
eqvx} 011111 Ss A B 0100011100 Re 
tlbie 1:2 011111 00000 00000 B 0100110010 0 
eciwx 011111 D A B 0100110110 0 
Ihzux 011111 D A B 0100110111 0 
xorx) 011111 Ss A B 0100111100 Re 
mfspr 3 011111 D spr 0101010011 0 
Ihax 011111 D A B 0101010111 0 
tlbia 1.2 4 011111 00000 00000 00000 0101110010 0 
mftb 011111 D tbr 0101110011 0 
Ihaux 011111 D A B 0101110111 0 
sthx 011111 Ss A B 0110010111 0 
orex} 011111 Ss A B 0110011100 Re 
ecowx 011111 Ss A B 0110110110 0 
sthux 011111 Ss A B 0110110111 0 

orx} 011111 Ss A B 0110111100 Re 
divwux| 011111 D A B OE 0111001011 Re 

mtspr ° 011111 Ss spr 0111010011 0 
debi ' 011111 00000 B 0111010110 0 
nandx| 011111 Ss B 0111011100 Re 
divwwx} 011111 D B OE 0111101011 Re 
merxr| 011111 crfD | 00 00000 00000 1000000000 0 

Iswx © 011111 D A B 1000010101 0 
Iwbrx 011111 D A B 1000010110 0 
Ifsx 011111 D A B 1000010111 0 

srwx} 011111 Ss A B 1000011000 Re 
tlbsyne 1:2 011111 00000 00000 00000 1000110110 0 
Ifsux 011111 D A B 1000110111 0 

mfsr 1 011111 D 0 SR 00000 1001010011 0 
Iswi © 011111 D A NB 1001010101 0 
sync 011111 00000 00000 00000 1001010110 0 




















































































































MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
Instructions Sorted by Opcode 

























































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Ifdx 011111 D A B 1001010111 0 
Ifdux 011111 D A B 1001110111 0 

mfsrin ‘ 011111 D 00000 B 1010010011 0 

stswx © 011111 Ss A B 1010010101 0 

stwbrx 011111 Ss A B 1010010110 0 
stfsx 011111 Ss A B 1010010111 0 
stfsux 011111 Ss A B 1010110111 0 
stswi° 011111 Ss A NB 1011010101 0 
stfdx 011111 Ss A B 1011010111 0 

dcba 2:4 011111 00000 A B 1011110110 0 
stfdux 011111 Ss A B sTR 0s We es 9 0 
Ihbrx 011111 D A B 1100010110 0 
srawx) 011111 Ss A B 1100011000 Re 

srawix} 011111 Ss A SH 1100111000 Re 
eieio 011111 00000 00000 00000 1101010110 0 
sthbrx 011111 Ss A B 1110010110 0 
extshx) 011111 Ss A 00000 1110011010 Re 
extsbx) 011111 Ss A 00000 1110111010 Re 
icbi 011111 00000 A B 1111010110 0 

stfiwx 2 011111 Ss A B 1111010111 0 
debz 011111 00000 A B 1111110110 0 
lwz 100000 D A d 
Iwzu 100001 D A d 
Ibz 100010 D A d 
Ibzu 100011 D A d 
stw 100100 Ss A d 
stwu 100101 Ss A d 
stb 100110 Ss A d 
stbu 100144 Ss A d 
Ihz 101000 D A d 
Ihzu 107004 D A d 
Ilha 101010 D A d 
Ihau 101071 D A d 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
sth 101100 Ss A d 
sthu 101101 Ss A d 
Imw © 101110 D A d 
stmw © 101111 S A d 
lfs 110000 D A d 
lfsu 110001 D A d 
lfd 110010 D A d 
Ifdu 110011 D A d 
stfs 110100 iS A d 
stfsu 110101 Ss A d 
stfd 110110 Ss A d 
stfdu 110111 Ss A d 
fdivsx 111011 D A B 00000 10010 |Rc 
fsubsx 111011 D A B 00000 10100 |Rc 
faddsx sis Ms 8 D A B 00000 10101 |Rc 
fsqrtsx 2:4 111011 D 00000 B 00000 10110 |Rc 
fresx 2 t1oI4 D 00000 B 00000 11000 |Rc 
fmulsx 111011 D A 00000 GC 11001 |Rc 
fmsubsx 111011 D A B C 11100 |Rc 
fmaddsx 111011 D A B C 11101 |Re 
fnmsubsx 11-4004 D A B c 11110 |Rc 
fnmaddsx 111011 D A B C 11111 |Re 
fempu 111111 criD | 00 A B 0000000000 0 
frspx 111111 D 00000 B 0000001100 Re 
fctiwx 111111 D 00000 B 0000001110 
fctiwzx 111111 D 00000 B 0000001111 Re 
fdivx 111111 D A B 00000 10010 |Rc 
fsubx 111111 D A B 00000 10100 |Rc 
faddx (10444 D A B 00000 10101 |Rc 
fsqrtx 2:4 111111 D 00000 B 00000 10110 Rc 
fselx 2 111111 D A B C 10111 |Re 
fmulx 111111 D A 00000 c 11001 |Rc 
fmsubx 111111 D A B G 11100 |Rc 
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Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
fmaddx 111111 D A B Cc 11101 Re 
fnmsubx 111111 D A B Cc 11110 Re 
fnmaddx 111111 D A B Cc 11111 Rc 
fempo 111111 crfD 00 A B 0000100000 0 
mtfsb1x 111111 crbD 00000 00000 0000100110 Re 
fnegx 111111 D 00000 B 0000101000 Re 
merfs 111111 crfD 00 crfS 00 00000 0001000000 0 
mtfsb0x 111111 crbD 00000 00000 0001000110 Re 
fmrx 111111 D 00000 B 0001001000 Re 
mtfsfix 111111 crfD 00 00000 IMM 0 0010000110 Rc 
fnabsx 111111 D 00000 B 0010001000 Re 
fabsx 111111 D 00000 B 0100001000 Re 
mffsx 111111 D 00000 00000 1001000111 Re 
mtfsfx 111111 0 FM 0 B 1011000111 Re 
Notes: 


1Supervisor-level instruction 


2Optional instruction 


3Supervisor- and user-level instruction 


432-bit instruction not implemented by the MPC750 
5Load/store string/multiple instruction 


A.3 


Instructions Grouped by Functional Categories 


Table A-3 through Table A-28. list the PowerPC instructions grouped by function. 


Key: 


Table A-3. Integer Arithmetic Instructions 


Reserved bits 


























Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx 31 D B OE 266 Re 
addcx 31 D A B OE 10 Re 
addex 31 D A B OE 138 Re 
addi 14 D A SIMM 
addic 12 D A SIMM 
addic. 13 D A SIMM 
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addis 15 D A SIMM 
addmex 31 D A 00000 (oe 234 Re 
addzex 31 D A 00000 (oe 202 Re 
divwx 31 D A B OE 491 Re 
divwux 31 D A B OE 459 Re 
mulhwx 31 D A B 0 75 Re 
mulhwux 31 D A B 0 11 Re 
mulli 07 D A SIMM 
mullwx 31 D A B OE 235 Re 
negx 31 D A 00000 (oF 104 Re 
subfx 31 D A B OE 40 Re 
subfcx 31 D A B OE 8 Re 
subficx 08 D A SIMM 
subfex 31 D A B OE 136 Re 
subfmex 31 D A 00000 (oe 232 Re 
subfzex 31 D A 00000 (oe 200 Re 
Table A-4. Integer Compare Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
cmp 31 erfD |O}L B 0000000000 0 
cmpi 11 erfD |O}L A SIMM 
cmpl 31 erffD |O}L A B 32 0 
cmpli 10 erfD |O}L A UIMM 
Table A-5. Integer Logical Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
andx 31 Ss A B 28 Re 
andex 31 Ss A B 60 Re 
andi. 28 Ss A UIMM 
andis. 29 Ss A UIMM 
entlzwx 31 Ss A 00000 26 Re 
eqvx 31 Ss A B 284 Re 
extsbx 31 Ss A 00000 954 Re 
extshx 31 Ss A 00000 922 Re 
nandx 31 Ss A B 476 Re 
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norx 
orx 
orcex 
ori 
oris 
xorx 
xori 


xoris 


Name 
rlwimix 
rlwinmx 


rlwnmx 


Name 


slwx 
srawx 
srawix 


srwx 


Name 


faddx 
faddsx 
fdivx 
fdivsx 
fmulx 
fmulsx 
fresx | 
frsqrtex ! 
fsubx 
fsubsx 


fselx | 


Freescale Semiconductor, Inc. 


Instructions Grouped by Functional Categories 






































31 Ss A B 124 Re 
31 Ss A B 444 Re 
31 Ss A B 412 Re 
24 Ss A UIMM 
25 Ss A UIMM 
31 Ss A B 316 Re 
26 Ss A UIMM 
27 Ss A UIMM 








0 


Table A-6. Integer Rotate Instructions 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 














22 S) A SH MB ME Re 
20 S) A SH MB ME Re 
21 S A SH MB ME Re 


























0 


Table A-7. Integer Shift Instructions 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

































































31 Ss A B 24 Re 
31 Ss A B 792 Re 
31 Ss A SH 824 Re 
31 Ss A B 536 Re 
Table A-8. Floating-Point Arithmetic Instructions 
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 D A B 00000 21 Re 
59 D B 00000 21 Re 
63 D B 00000 18 Re 
59 D B 00000 18 Re 
63 D 00000 c 25 Re 
59 D 00000 C 25 Re 
59 D B 00000 24 Re 
63 D B 00000 26 Re 
63 D B 00000 20 Re 
59 D A B 00000 20 Re 
63 D A B Cc 23 Re 
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fsqrtx |: 2 


fsqrtsx 1:2 


Name 


fmaddx 
fmaddsx 
fmsubx 
fmsubsx 
fnmaddx 
fnmaddsx 
fnmsubx 


fnmsubsx 


Name 
fctiwx 
fctiwzx 


frspx 


Name 
fempo 


fcmpu 


Name 


mcrfs 
mffsx 
mtfsb0x 
mtfsb1x 


mtfstx 


Freescale Semiconductor, Inc. 
Instructions Grouped by Functional Categories 



















































































































































































63 D 00000 B 00000 22 Re 
59 D 00000 B 00000 22 Re 

Notes: 
1Optional instruction 
2 32-bit instruction not implemented by the MPC750 

Table A-9. Floating-Point Multiply-Add Instructions 

0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 D A B c 29 Re 
59 D A B C 29 Re 
63 D A B C 28 Re 
59 D A B c 28 Re 
63 D A B Cc 31 Re 
59 D A B c 31 Re 
63 D A B c 30 Re 
59 D A B C 30 Re 

Table A-10. Floating-Point Rounding and Conversion Instructions 

0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 D 00000 B 14 Re 
63 D 00000 B 15 Re 
63 D 00000 B 12 Re 

Table A-11. Floating-Point Compare Instructions 

0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 erfD | 00 A B 32 0 
63 erfD | 00 A B 0 0 

Table A-12. Floating-Point Status and Control Register Instructions 

0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 crffD | 00] crfS | 00} 00000 64 0 
63 D 00000 00000 583 Re 
63 erbD 00000 00000 70 Re 
63 erbD 00000 00000 38 Re 
31 0 FM 0 B 711 Re 
63 crfD | 00} 00000 IMM 134 Re 


mtfsfix 
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Instructions Grouped by Functional Categories 


Table A-13. Integer Load Instructions 



























































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Ibz 34 D A d 
Ibzu 35 D A d 
Ibzux 31 D A B 119 0 
Ibzx 31 D A B 87 0 
Ilha 42 D A d 
Ihau 43 D A d 
Ihaux 31 D A B 375 0 
lhax 31 D A B 343 0 
lhz 40 D A d 
Ihzu 41 D A d 
Ihzux 31 D A B 311 0 
Ihzx 31 D A B 279 0 
lwz 32 D A d 
lwzu 33 D A d 
lwzux 31 D A B 55 0 
lwzx 31 D A B 23 0 


























Table A-14. Integer Store Instructions 












































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
stb 38 S d 
stbu 39 S A d 
stbux 31 S A B 247 0 
stbx 31 S A B 215 0 
sth 44 S A d 
sthu 45 S A d 
sthux 31 S A B 439 0 
sthx 31 S A B 407 0 
stw 36 S A d 
stwu 37 S A d 
stwux 31 S A B 183 0 
stwx 31 S A B 151 0 
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Instructions Grouped by Functional Categories 


Table A-15. Integer Load and Store with Byte Reverse Instructions 















































































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Ihbrx 31 D A B 790 0 
lwbrx 31 D A B 534 0 

sthbrx 31 Ss A B 918 0 

stwbrx 31 Ss A B 662 0 

Table A-16. Integer Load and Store Multiple Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

Imw 46 D A d 

stmw 47 S) A d 

Note: 
Table A-17. Integer Load and Store String Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Iswi 31 D A NB 597 0 
Iswx 31 D A B 533 0 
stswi 31 Ss A NB 725 0 
stswx 31 Ss A B 661 0 
Table A-18. Memory Synchronization Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
eieio 31 00000 00000 00000 854 0 
isync 19 00000 00000 00000 150 0 
lwarx 31 D A B 20 0 

stwex. 31 Ss A B 150 1 
sync 31 00000 00000 00000 598 0 
Table A-19. Floating-Point Load Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

Ifd 50 D d 

Ifdu 51 D A d 
Ifdux 31 D A B 631 0 
Ifdx 31 D A B 599 0 

Ifs 48 D A d 

Ifsu 49 D A d 
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lfsux 31 D A B 567 0 





lfsx 31 D A B 535 0 


























Table A-20. Floating-Point Store Instructions 
































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
stfd 54 Ss d 
stfdu 55 iS A d 
stfdux 31 Ss A B 759 0 
stfdx 31 Ss A B 727 0 
stfiwx | 31 Ss A B 983 0 
stfs 52 Ss A d 
stfsu 53 Ss A d 
stfsux 31 Ss A B 695 0 
stfsx 31 Ss A B 663 0 


























1Optional instruction 


Table A-21. Floating-Point Move Instructions 














Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
fabsx 63 D 00000 B 264 Re 
fmrx 63 D 00000 B 72 Re 
fnabsx 63 D 00000 B 136 Re 
fnegx 63 D 00000 B 40 Re 


























Table A-22. Branch Instructions 

















Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
bx 18 LI IAA|LK 

bcex 16 BO BI BD IAA|LK 
bectrx 19 BO Bl 00000 528 LK 
belrx 19 BO BI 00000 16 LK 


























Table A-23. Condition Register Logical Instructions 











Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
crand 19 crbD crbA crbB 257 0 
crandc 19 crbD crbA crbB 129 0 
creqv 19 crbD crbA crbB 289 0 
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crnand 
crnor 
cror 
crorc 
crxor 


merf 


Freescale Semiconductor, Inc. 
Instructions Grouped by Functional Categories 
































Table A-24. System Linkage Instructions 





19 crbD crbA crbB 225 0 
19 crbD crbA crbB 33 0 
19 crbD crbA crbB 449 0 
19 crbD crbA crbB 417 0 
19 crbD crbA crbB 193 0 
19 crfD 00 crfS 00 00000 0000000000 0 




















































































































Name 0 5 6 7 8 Q9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
rfi 19 00000 00000 00000 50 0 
sc 17 00000 00000 000000000000000 0 
Note: 
1Supervisor-level instruction 
Table A-25. Trap Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
tw 31 TO A B 4 0 
twi 03 TO A SIMM 
Table A-26. Processor Control Instructions 
Name 5 6 7 8 Q9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
merxr 31 crfS 00 00000 00000 512 0 
mfcr 31 D 00000 00000 19 0 
mfmsr 1 31 D 00000 00000 83 0 
mfspr 2 31 D spr 339 0 
mftb 31 D tpr 371 0 
mtcrf 31 Ss CRM 0 144 0 
mtmsr ' 31 S) 00000 00000 146 0 
mtspr 2 31 D spr 467 0 
Notes: 
1Supervisor-level instruction 
2Supervisor- and user-level instruction 
Table A-27. Cache Management Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
dcba 13 31 00000 A B 758 0 
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debt 31 00000 A B 86 0 
debi 2 31 00000 A B 470 0 
debst 31 00000 A B 54 0 
debt 31 00000 A B 278 0 
debtst 31 00000 A B 246 0 
debz 31 00000 A B 1014 0 
icbi 31 00000 A B 982 0 
Notes: 


1Optional instruction 
2Supervisor-level instruction 
3 32-bit instruction not implemented by the MPC750 


Table A-28. Segment Register Manipulation Instructions. 



































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
mfsr 1 31 D 0 SR 00000 595 0 
mfsrin ‘ 31 D 00000 B 659 0 

mtsr | 31 Ss 0 SR 00000 210 0 
mtsrin 31 Ss 00000 B 242 0 
Note: 


1Supervisor-level instruction 


Table A-29. Lookaside Buffer Management Instructions 











Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
tIbia 12 31 00000 00000 00000 370 0 
tlbie 1:2 31 00000 00000 B 306 0 
tlbsync 1 31 00000 00000 00000 566 0 


























Notes: 


1Supervisor-level instruction 
2Optional instruction 


Table A-30. External Control Instructions 





Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
eciwx 31 D A B 310 0 
ecowx 31 S) A B 438 0 
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Instructions Sorted by Form 


A.4_ Instructions Sorted by Form 
Table A-31 through Table A-42 list the PowerPC instructions grouped by form. 






























































































































































Key: 
[| Reserved bits 
Table A-31. I-Form 
OPCD LI IAA|LK 
Specific Instruction 
Name 0 5 6 7 8 Q9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
bx 18 LI IAA|LK 
Table A-32. B-Form 
OPCD BO Bl BD IAA|LK 
Specific Instruction 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
bex 16 BO Bl BD IAA|LK 
Table A-33. SC-Form 
OPCD 00000 00000 000000000000000 1/0 
Specific Instruction 
Name 0 5 6 7 8 Q9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
sc 17 00000 00000 000000000000000 1/0 
Table A-34. D-Form 
OPCD D d 
OPCD D A SIMM 
OPCD Ss A d 
OPCD iS) A UIMM 
OPCD crfD O;L A SIMM 
OPCD crfD O;L A UIMM 
OPCD TO A SIMM 
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Specific Instructions 

























































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addi 14 D A SIMM 
addic 12 D A SIMM 
addic. 13 D A SIMM 
addis 15 D A SIMM 
andi. 28 Ss A UIMM 
andis. 29 Ss A UIMM 
cmpi 11 cfD |O}L A SIMM 
cmpli 10 cfD |O}L A UIMM 
Ibz 34 D A d 
Ibzu 35 D A d 
lfd 50 D A d 
Ifdu 51 D A d 
lfs 48 D A d 
lfsu 49 D A d 
Ilha 42 D A d 
Ihau 43 D A d 
Ihz 40 D A d 
Ihzu 41 D A d 
Imw | 46 D A d 
lwz 32 D A d 
lwzu 33 D A d 
mulli 7 D A SIMM 
ori 24 Ss A UIMM 
oris 25 Ss A UIMM 
stb 38 Ss A d 
stbu 39 Ss A 
stfd 54 Ss A d 
stfdu 55 Ss A d 
stfs 52 Ss A d 
stfsu 53 iS) A d 
sth 44 Ss A d 
sthu 45 iS A d 
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stmw | 
stw 
stwu 
subfic 
twi 
xori 


xoris 
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Instructions Sorted by Form 




















47 S A d 
36 S A d 
37 S A d 
08 D A SIMM 
03 TO A SIMM 
26 S A UIMM 
27 S A UIMM 

















Note: 


'Load/store string/multiple instruction 


Table A-35. X-Form 




























































































OPCD D A B XO 0 
OPCD D A NB XO 0 
OPCD D 00000 B XO 0 
OPCD D 00000 00000 XO 0 
OPCD D 0 SR 00000 XO 0 
OPCD S A B XO Re 
OPCD S A B XO 1 
OPCD S) A B XO 0 
OPCD S A NB XO 0 
OPCD S) A 00000 XO Re 
OPCD S) 00000 B XO 0 
OPCD S 00000 00000 XO 0 
OPCD S 0 SR 00000 XO 0 
OPCD S SH XO Re 
OPCD criD /|O/L B XO 0 
OPCD crfD 00 B XO 0 
OPCD crfD 00 crfS 00 00000 XO 0 
OPCD crfD 00 00000 00000 XO 0 
OPCD crfD 00 00000 IMM XO Re 
OPCD TO A B XO 0 
OPCD D 00000 B XO Re 
OPCD D 00000 00000 XO Re 
OPCD crbD 00000 00000 XO Re 
OPCD 00000 A B XO 0 
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OPCD 00000 00000 B XO 0 





OPCD 00000 00000 00000 XO 0 























Specific Instructions 






















































































andx 31 Ss B 28 Re 
andex 31 Ss A B 60 Re 
cmp 31 crfD |O}L A B 0 0 
cmpl 31 crfD |O}L A B 32 0 
entlzwx 31 Ss A 00000 26 Re 
dcba 16 31 00000 A B 758 0 
debf 31 00000 A B 86 0 
debi 2 31 00000 A B 470 0 
debst 31 00000 A B 54 0 
debt 31 00000 A B 278 0 
debtst 31 00000 A B 246 0 
dcbz 31 00000 A B 1014 0 
eciwx 31 D A B 310 0 
ecowx 31 Ss A B 438 0 
eieio 31 00000 00000 00000 854 0 
eqvx 31 Ss A B 284 Re 
extsbx 31 Ss A 00000 954 Re 
extshx 31 Ss A 00000 922 Re 
fabsx 63 D 00000 B 264 Re 
fempo 63 crfD | 00 A B 32 0 
fempu 63 crfD | 00 A B 0 0 
fetiwx 63 D 00000 B 14 Re 
fetiwzx 63 D 00000 B 15 Re 
fmrx 63 D 00000 B 72 Re 
fnabsx 63 D 00000 B 136 Re 
fnegx 63 D 00000 B 40 Re 
frspx 63 D 00000 B 12 Re 
icbi 31 00000 A B 982 0 
Ibzux 31 D A B 119 0 
Ibzx 31 D A B 87 0 
Ifdux 31 D A B 631 0 
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lfdx 31 D A B 599 0 
Ifsux 31 D A B 567 0 
lfsx 31 D A B 535 0 
Ihaux 31 D A B 375 0 
Ihax 31 D A B 343 0 
Ihbrx 31 D A B 790 0 
Ihzux 31 D A B 311 0 
Ihzx 31 D A B 279 0 
Iswi 3 31 D A NB 597 0 
Iswx 4 31 D A B 533 0 
lwarx 31 D A B 20 0 
Iwbrx 31 D A B 534 0 
lwzux 31 D A B 55 0 
lwzx 31 D A B 23 0 
merts 63 cfD | 00] crfS | 00 00000 64 0 
merxr 31 crfD | 00 00000 00000 512 0 
mfcr 31 D 00000 00000 19 0 
mffsx 63 D 00000 00000 583 Re 
mfmsr 2 31 D 00000 00000 83 0 
mfsr 3 31 D 0 SR 00000 595 0 
mfsrin 2 31 D 00000 B 659 0 
mtfsb0x 63 crbD 00000 00000 70 Re 
mtfsb1x 63 crfD 00000 00000 38 Re 
mtfsfix 63 crbD | 00 00000 IMM 134 Re 
mtmsr 2 31 Ss 00000 00000 146 0 
mtsr 2 31 Ss 0 SR 00000 210 0 
nandx 31 iS A B 476 Re 
norx 31 Ss A B 124 Re 
orx 31 Ss A B 444 Re 
orcex 31 Ss A B 412 Re 
slwx 31 Ss A B 24 Re 
srawx 31 Ss A B 792 Rc 
srawix 31 Ss A SH 824 Re 
srwx 31 iS A B 536 Re 
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stbux 31 Ss A B 247 0 
stbx 31 Ss A B 215 0 
stfdux 31 Ss A B 759 0 
stfdx 31 Ss A B 727 0 
stfiwx ' 31 Ss A B 983 0 
stfsux 31 Ss A B 695 0 
stfsx 31 Ss A B 663 0 
sthbrx 31 Ss A B 918 0 
sthux 31 iS A B 439 0 
sthx 31 iS A B 407 0 
stswi 4 31 iS A NB 725 0 
stswx 4 31 Ss A B 661 0 
stwbrx 31 Ss A B 662 0 
stwex. 31 Ss A B 150 1 
stwux 31 Ss A B 183 0 
stwx 31 iS B 151 0 
sync 31 00000 00000 00000 598 0 
tlbia 2: 3.6 31 00000 00000 00000 370 0 
tlbie 2: 3 31 00000 00000 B 306 0 
tlbsync 2: 3 31 00000 00000 00000 566 0 
tw 31 TO A B 4 0 
xorx 31 Ss A B 316 Rc 
Notes: 
1Optional instruction 
2Supervisor-level instruction 
3Load/store string/multiple instruction 
Table A-36. XL-Form 
OPCD BO BI 00000 xO LK 
OPCD crbD crbA crbB xO 0 
OPCD cfD | 00] crfS | 00 00000 XO 0 
OPCD 00000 00000 00000 XO 0 
Specific Instructions 
Name _ 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
bectrx 19 BO BI 00000 528 LK 
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belrx 19 BO BI 00000 16 LK 
crand 19 crbD crbA crbB 257 0 
crandc 19 crbD crbA crbB 129 0 
creqv 19 crbD crbA crbB 289 0 
crnand 19 crbD crbA crbB 225 0 
crnor 19 crbD crbA crbB 33 0 
cror 19 crbD crbA crbB 449 0 
crore 19 crbD crbA crbB 417 0 
crxor 19 crbD crbA crbB 193 0 
isync 19 00000 00000 00000 150 0 
merf 19 crfD 00 crfS 00 00000 0 0 
rfi 19 00000 00000 00000 50 0 

Note: 


1Supervisor-level instruction 


Table A-37. XFX-Form 






















































































OPCD D spr XO 0 
OPCD D 0 CRM 0 XO 0 
OPCD S) spr XO 0 
OPCD D tbr XO 0 
Specific Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
mfspr ‘ 31 D spr 339 0 
mftb 31 D tbr 371 0 
mterf 31 Ss 0 CRM 0 144 0 
mtspr ‘ 31 D spr 467 0 

Note: 


1Supervisor- and user-level instruction 


Table A-38. XFL-Form 





OPCD 0 FM 0 B XO 





























Specific Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 





mtfstx 63 0 FM 0 B 711 
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Table A-39. XO-Form 








OPCD D A B OE XO Re 
OPCD D A B 0 XO Re 
OPCD D A 00000 (OF XO Re 





























Specific Instructions 































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx 31 D A B OE 266 Rc 
addcx 31 D A B OE 10 Re 
addex 31 D A B OE 138 Re 
addmex 31 D A 00000 OE 234 Re 
addzex 31 D A 00000 OE 202 Re 
divwx 31 D A B OE 491 Re 
divwux 31 D A B OE 459 Re 
mulhwx 31 D A B 0 75 Re 
mulhwux 31 D A B 0 11 Re 
mullwx 31 D A B OE 235 Rc 
negx 31 D A 00000 OE 104 Re 
subfx 31 D A B OE 40 Re 
subfcx 31 D A B OE 8 Re 
subfex 31 D A B OE 136 Re 
subfmex 31 D A 00000 OE 232 Rc 
subfzex 31 D A 00000 OE 200 Re 
Table A-40. A-Form 

OPCD D A B 00000 XO Re 

OPCD D A B Cc XO Re 

OPCD D A 00000 Cc XO Re 

OPCD D 00000 B 00000 XO Re 

Specific Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
faddx 63 D A B 00000 21 Rc 
faddsx 59 D A B 00000 21 Rc 
fdivx 63 D A B 00000 18 Re 
fdivsx 59 D A B 00000 18 Rc 
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fmaddx 63 D A B Cc 29 Re 
fmaddsx 59 D A B C 29 Re 
fmsubx 63 D A B C 28 Re 
fmsubsx 59 D A B c 28 Re 
fmulx 63 D A 00000 C 25 Re 
fmulsx 59 D A 00000 C 25 Re 
fnmaddx 63 D A B C 31 Re 
fnmaddsx 59 D A B C 31 Re 
fnmsubx 63 D A B C 30 Re 
fnmsubsx 59 D A B Cc 30 Re 
fresx | 59 D 00000 B 00000 24 Re 
frsqrtex ' 63 D 00000 B 00000 26 Re 
fselx ' 63 D A B Cc 23 Re 
fsqrtx |: 2 63 D 00000 B 00000 22 Re 
fsqrtsx | 2 59 D 00000 B 00000 22 Re 
fsubx 63 D A B 00000 20 Re 
fsubsx 59 D A B 00000 20 Re 
Note: 
1Optional instruction 
2 32-bit instruction not implemented by the MPC750 
Table A-41. M-Form 
OPCD Ss A SH MB ME Re 
OPCD Ss A B MB ME Re 
Specific Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
rlwimix 20 Ss A SH MB ME Re 
rlwinmx 24 Ss A SH MB ME Re 
rlwnmx 23 iS A B MB ME Re 





























MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 





Freescale Semiconductor, Inc. 
Instruction Set Legend 


A.5 Instruction Set Legend 


Table A-42provides general information on the PowerPC instruction set (such as the 
architectural level, privilege level, and form). 
Table A-42. PowerPC Instruction Set Legend 





UISA VEA OEA Supervisor Level Optional Form 
addx V XO 
addcx V xO 
addex V xO 
addi V D 
addic V D 
addic. V D 
addis V D 
addmex V xO 
addzex V xO 
andx V Xx 
andcx V x 
andi. V D 
andis. V D 
bx V 
bex V B 
bectrx V XL 
belrx V XL 
cmp V x 
cmpi V D 
cmpl V xX 
cmpli V D 
cntlzwx V Xx 
crand V XL 
crandc V XL 
creqv V XL 
crnand V XL 
crnor V XL 
cror V XL 
crorc V XL 
crxor V XL 
dcba V V x 
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Table A-42. PowerPC Instruction Set Legend (continued) 


UISA 


VEA OEA Supervisor Level 


Optional 





dcbf 





dcbi 





dcbst 





dcbt 





dcbtst 





dcbz 


<<} ay} ay ae 





divwx 





divwux 





eciwx 





ecowx 





eieio 





eqvx 





extsbx 





extshx 





fabsx 





faddx 





faddsx 





fempo 





fempu 





fctiwx 





fctiwzx 





fdivx 





fdivsx 





fmaddx 





fmaddsx 





fmrx 





fmsubx 





fmsubsx 





fmulx 





fmulsx 





fnabsx 





fnegx 





fnmaddx 





fnmaddsx 





<a} ay ey ay yi ly =y Hy] Hl Hy} SH] Hl Hy Hy} Hy Hs Hy S|} SH] HF =|] Sy] Se 
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Table A-42. PowerPC Instruction Set Legend (continued) 


UISA VEA OEA Supervisor Level Optional Form 





a 
> 


fnmsubx 





fnmsubsx 





fresx 





frspx 





frsqrtex 





fselx 





fsqrtx 





fsqrtsx 





fsubx 





<a} ay} ay ay =} =} =} =] = 


fsubsx 





2/2}, 2) 2 
<x) >) >| >] >] S| S| Kk] S| > 


icbi V 





x< 
bas 


isync V 





Ibz 





Ibzu 





Ibzux 





Ibzx 





lfd 





Ifdu 





lfdux 





lfdx 





lfs 





Ifsu 





lfsux 





lfsx 





Ilha 





Ihau 





Ihaux 





Ihax 





Ihbrx 





Ihz 





Ihzu 





Ihzux 





Ihzx 








2/2} aja} ay a} ay al ay al ay al aly al ay al =a] al aly al al a 
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Table A-42. PowerPC Instruction Set Legend (continued) 


























































































































UISA VEA OEA Supervisor Level Optional Form 
Iswi 2 V x 
Iswx 2 V x 
lwarx V Xx 
Iwbrx V x 
lwz V D 
lwzu V D 
lwzux V Xx 
lwzx V Xx 
merf V XL 
merfs V Xx 
merxr V Xx 
mfcr V Xx 
mffs V x 
mfmsr V V Xx 
mfspr‘ V V V XFX 
mfsr V V Xx 
mfsrin V V Xx 
mftb V XFX 
mterf v XFX 
mtfsb0x V Xx 
mtfsb1x v x 
mtfsfx v XFL 
mtfsfix v X 
mtmsr V V Xx 
mtspr‘ V V V XFX 
mtsr V V Xx 
mtsrin V V x 
mulhwx V XO 
mulhwux V XO 
mulli V D 
mullwx V XO 
nandx V Xx 
negx V XO 
norx V Xx 
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Table A-42. PowerPC Instruction Set Legend (continued) 


UISA VEA OEA Supervisor Level Optional Form 





orx V Xx 





orcex 





ori 





<a} aye 
iw) 


oris 





rti V V XL 





rlwimix 





rlwinmx 





= 


rlwnmx 





a 
ie?) 
QO 


sc 





slwx 





srawx 





srawix 





srwx 





stb 





stbu 





stbux 





stbx 





stfd 





stfdu 





stfdux 





stfdx 





stfiwx 





stfs 





stfsu 





stfsux 





stfsx 





sth 





sthbrx 





sthu 





sthux 





sthx 





stmw 2 





stswi 2 
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stswx 2 
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Table A-42. PowerPC Instruction Set Legend (continued) 

























































































UISA VEA OEA Supervisor Level Optional Form 
stw V D 
stwbrx V Xx 
stwex. V Xx 
stwu V D 
stwux V Xx 
stwx V Xx 
subfx V xO 
subfcx V xO 
subfex V xO 
subfic V D 
subfmex V xO 
subfzex V xO 
sync V xX 
tlbiax V x 
tlbiex V x 
tlbsync xX 
tw V Xx 
twi V D 
xorx V Xx 
xori V D 
xoris V D 
Notes: 


1 Supervisor- and user-level instruction 


2 Load/store string or multiple instruction 
3 32-bit instruction not implemented by the MPC750 
4 Instruction is optional for 64-bit implementations only. 
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Appendix B 
Instructions Not Implemented 


This appendix provides a list of the 32-bit and 64-bit PowerPC instructions that are not 
implemented in the MPC750 microprocessor. Note that any attempt to execute instructions 
that are not implemented on the MPC750 will generate an illegal instruction exception. 
Note that exceptions are referred to as interrupts in the architecture specification. 


Table B-1 provides the 32-bit PowerPC instructions that are optional to the PowerPC 
architecture but not implemented by the MPC750. 


Table B-1. 32-Bit Instructions Not Implemented by the MPC750 Processor 





Mnemonic Instruction 
dcba Data Cache Block Allocate 
fsqrt Floating Square Root (Double-Precision) 
fsqrts Floating Square Root Single 
tlbia TLB Invalidate All 











Table B-2 provides a list of 64-bit instructions that are not implemented by the MPC750. 
Table B-2. 64-Bit Instructions Not Implemented by the MPC750 Processor 









































Mnemonic Instruction 
cntlzd Count Leading Zeros Double Word 
divd Divide Double Word 
divdu Divide Double Word Unsigned 
extsw Extend Sign Word 
fcfid Floating Convert From Integer Double Word 
fctid Floating Convert to Integer Double Word 
fctidz Floating Convert to Integer Double Word with Round toward Zero 
Id Load Double Word 
Idarx Load Double Word and Reserve Indexed 
Idu Load Double Word with Update 
Idux Load Double Word with Update Indexed 
Idx Load Double Word Indexed 
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Table B-2. 64-Bit Instructions Not Implemented by the MPC750 Processor 






















































































Mnemonic Instruction 
lwa Load Word Algebraic 
lwaux Load Word Algebraic with Update Indexed 
lwax Load Word Algebraic Indexed 
mtmsrd Move to Machine State Register Double Word 
mtsrd Move to Segment Register Double Word 
mtsrdin Move to Segment Register Double Word Indirect 
mulld Multiply Low Double Word 
mulhd Multiply High Double Word 
mulhdu Multiply High Double Word Unsigned 
ridcl Rotate Left Double Word then Clear Left 
rider Rotate Left Double Word then Clear Right 
rldic Rotate Left Double Word Immediate then Clear 
ridicl Rotate Left Double Word Immediate then Clear Left 
ridicr Rotate Left Double Word Immediate then Clear Right 
rldimi Rotate Left Double Word Immediate then Mask Insert 
slbia SLB Invalidate All 
slbie SLB Invalidate Entry 
sld Shift Left Double Word 
srad Shift Right Algebraic Double Word 
sradi Shift Right Algebraic Double Word Immediate 
srd Shift Right Double Word 
std Store Double Word 
stdex. Store Double Word Conditional Indexed 
stdu Store Double Word with Update 
stdux Store Double Word Indexed with Update 
stdx Store Double Word Indexed 
td Trap Double Word 
tdi Trap Double Word Immediate 
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Appendix C 
MPC755 Embedded G3 Microprocessor 


The MPC755 is a derivative of the MPC750 microprocessor design and is intended 
primarily for use in embedded systems. All of the information in the MPC750 RISC 
Microprocessor Family User’s Manual applies to the MPC755 microprocessor with the 
exceptions and additions noted in this appendix. In the event the two descriptions conflict 
with each other, this appendix supersedes the information in the MPC750 RISC 
Microprocessor Family User’s Manual. 


The MPC745 is a lower-pin-count device that operates identically to the MPC755, except 
that it doesn’t implement the L2 cache interface. In the same way that the MPC750 User’s 
Manual also describes the functionality of the MPC740, this appendix describes the 
functionality of the MPC745. All information herein applies to the MPC745, except where 
otherwise noted (in particular, the L2 cache information does not apply to the MPC745). 


This document describes specific details about the implementation of the MPC755 as a 
low-power, 32-bit member of the processor family that implements the PowerPC 
architecture, and how it differs from the MPC750. Note that the individual section headings 
indicate the chapters in the MPC750 User’s Manual to which they correspond. The sections 
are as follows: 


e C.1, “MPC755 Overview,” describes general features of the MPC755 with respect 
to the PowerPC architecture. 

¢ C4, “The MPC755 Programming Model (Chapter 2),” describes the differences 
between the programming model of the MPC750 and MPC755. 


¢ C.5, “MPC755 L1 Instruction and Data Cache Operation (Chapter 3),” describes the 
aspects of the L1 instruction and data cache operation that are specific to the 
MPC75S. 

¢ C.6, “MPC755 Exceptions (Chapter 4),” describes how the MPC755 embedded 
processor implements the exception model defined by the PowerPC operating 
environment architecture (OKA). 

¢ C.7, “MPC755 Memory Management (Chapter 5),” describes the MPC755 
embedded processor’s implementation of the memory management unit (MMU) 
specifications provided by the PowerPC operating environment architecture (OEA). 
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MPC755 Overview 


C.8, “MPC755 Instruction Timing (Chapter 6),” describes how the MPC755 
embedded processor fetches, dispatches, and executes instructions and how it 
reports the results of instruction execution. 


C.9, “MPC755 Signal Descriptions (Chapter 7),’ describes the MPC755 embedded 
processor’s external signals. 

C.10, “MPC755 System Interface Operation (Chapter 8),” describes the MPC755 
embedded processor bus interface and its operation. 

C.11, “MPC755 L2 Cache Interface Operation (Chapter 9),” describes the L2 cache 
interface and the private memory features of the MPC755. 

C.12, “Power and Thermal Management (Chapter 10),” describes the hardware 
support provided by the MPC755 for power and thermal management. 


C.13, “Performance Monitor (Chapter 11),” describes the performance monitor of 
the MPC755. 


Errata for the previous version of the MPC750 RISC Microprocessor Family User’s Manual 
is listed in Appendix D, “User’s Manual Revision History.” These corrections also apply to 
the MPC740. 


Table C-1 provides a revision history for this appendix. 


Table C-1. Document Revision History 











Document Revision Substantive Change(s) 
Rev. 0-2 Initial release of the MPC750 RISC Microprocessor Family User’s Manual Errata. 
Rev. 3 Combined the MPC750 User’s Manual Errata and MPC755 Supplement documents. 





MPC755 Supplement, Section 9.2.2—Edited rst sentence of second b ullet. 





MPC755 Supplement, Section 9.4.1—In Table 26, replaced L2SL (bit 16) description. 








C.1 


This appendix This material was taken from the MPC755 Supplement and added to Revision 1 of the 
MPC750 RISC Microprocessor Family User's Manual document as this appendix. No 
substantive changes to the information. 








MPC755 Overview 


This section is an overview of the MPC755. The following list of functional additions to the 
MPC755 from the MPC750 summarizes the changes visible either to a programmer or a 
system designer. 


Instruction and data cache locking mechanism added 
Four IBAT and four DBAT entries added 

Software table search mode added 

Four special-purpose (SPRG) registers added 


Parity generation and detection on L2 address bus added 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
MPC755 Functional Description 


e Instruction-only mode to L2 cache added 

¢ Private SRAM capability to L2 cache interface added 

¢ PB3-type SRAM support to L2 cache interface added 

¢ 32-bit data bus mode added 

¢ Bus voltage select (BVSEL) and L2 cache interface voltage select (L2VSEL) added 


C.2 MPC755 Functional Description 


This section summarizes some of the functional differences between the MPC750 and the 
MPC755. For information about the MPC755 L1 cache, see C.5, “MPC755 L1 Instruction 
and Data Cache Operation (Chapter 3).” 


The MPC755 has independent on-chip, 32-Kbyte, eight-way set-associative, physically 
addressed caches for instructions and data and independent instruction and data memory 
management units (MMUs). Each MMU has a 128-entry, two-way set-associative 
translation lookaside buffer (DTLB and ITLB) that saves recently used page address 
translations. Block address translation on the MPC755 is performed by either two 
four-entry or two eight-entry BAT arrays—one for instruction and one for data block 
address translation (IBAT and DBAT arrays). Note that the IBAT and DBAT arrays defined 
by the PowerPC architecture only contain four entries each. During block translation, 
effective addresses are compared simultaneously with all enabled BAT entries. The 
MPC755 also optionally supports software table search operations. 


The L2 cache is implemented with an on-chip, two-way set-associative tag memory, and 
with external, synchronous SRAMs for data storage. The external SRAMs are accessed 
through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of 
synchronous SRAMs. For information about the L2 cache implementation, see C.11, 
“MPC755 L2 Cache Interface Operation (Chapter 9).” 


The MPC755 has a 32-bit address bus and a 32/64-bit data bus. Multiple devices compete 
for system resources through a central external arbiter. The MPC755_ three-state 
cache-coherency protocol (MEI) supports the exclusive, modified, and invalid states, a 
compatible subset of the modified/exclusive/shared/invalid (MESI) four-state protocol, and 
it operates coherently in systems with four-state caches. The MPC755 supports single-beat 
and burst data transfers for memory accesses and memory-mapped I/O operations. The 
system interface is described in C.9, “MPC755 Signal Descriptions (Chapter 7),” and C.10, 
“MPC755 System Interface Operation (Chapter 8).” 


Appendix C. MPC755 Embedded G3 Microprocessor 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
MPC755 Functional Description 


The MPC755 has four software-controllable power-saving modes. Three static modes 
(doze, nap, and sleep) progressively reduce power dissipation. When functional units are 
idle, a dynamic power management mode causes those units to enter a low-power mode 
automatically without affecting operational performance, software execution, or external 
hardware. The MPC755 also provides a thermal assist unit (TAU) and a way to reduce the 
instruction fetch rate for limiting power dissipation. Power management is described in 
C.12, “Power and Thermal Management (Chapter 10).” 


Figure C-1 shows the MPC755 block diagram and parallel organization of the execution 
units (shaded in the diagram). The instruction unit fetches, dispatches, and predicts branch 
instructions. Note that this is a conceptual model that shows basic features rather than 
attempting to show how features are implemented physically. 
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Figure C-1. MPC755 Block Diagram 
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C.3. MPC755 Features 


This section lists the features of the MPC755. The interrelationship of these features is 
shown in Figure C-1. The major features of the MPC755 are as follows: 


¢ High-performance, superscalar microprocessor 


— As many as four instructions can be fetched from the instruction cache per clock 
cycle 


— As many as two instructions can be dispatched per clock 


— As many as six instructions can execute per clock (including two integer 
instructions) 


— Single-clock-cycle execution for most instructions 


e Six independent execution units and two register files 


— BPU featuring both static and dynamic branch prediction 


64-entry (16-set, four-way set-associative) branch target instruction cache 
(BTIC), a cache of branch instructions that have been encountered in 
branch/loop code sequences. If a target instruction is in the BTIC, it is fetched 
into the instruction queue a cycle sooner than it can be made available from 
the instruction cache. Typically, if a fetch access hits the BTIC, it provides the 
first two instructions in the target stream. 


512-entry branch history table (BHT) with two bits per entry for four levels of 
prediction—not-taken, strongly not-taken, taken, strongly taken 

Branch instructions that do not update the count register (CTR) or link register 
(LR) are removed from the instruction stream 


— Two integer units (Us) that share thirty-two 32-bit GPRs for integer operands 


IU1 can execute any integer instruction 


TU2 can execute all integer instructions except multiply and divide 
instructions (shift, rotate, arithmetic, and logical instructions). Most 
instructions that execute in the [U2 take one cycle to execute. The [U2 has a 
single-entry reservation station. 


— Three-stage floating-point unit (FPU) 


Fully IEEE 754-1985-compliant FPU for both single- and double-precision 
operations 


Supports non-IEEE mode for time-critical operations 
Hardware support for denormalized numbers 
Single-entry reservation station 


Thirty-two 64-bit FPRs for single- or double-precision operands 


— Two-stage load/store unit (LSU) 


Two-entry reservation station 
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Single-cycle, pipelined cache access 

Dedicated adder performs EA calculations 

Performs alignment and precision conversion for floating-point data 
Performs alignment and sign extension for integer data 

Three-entry store queue 

Supports both big- and little-endian modes 


— System register unit (SRU) handles miscellaneous instructions 


Executes CR logical and Move to/Move from SPR instructions (mtspr and 
mfspr) 


Single-entry reservation station 


¢ Rename buffers 
— Six GPR rename buffers 
— Six FPR rename buffers 


— Condition register buffering supports two CR writes per clock 


¢ Completion unit 


— The completion unit retires an instruction from the six-entry reorder buffer 
(completion queue) when all instructions ahead of it have been completed, the 
instruction has finished execution, and no exceptions are pending. 


— Guarantees sequential programming model (precise exception model) 


— Monitors all dispatched instructions and retires them in order 


— Tracks unresolved branches and flushes instructions from the mispredicted 
branch 


— Retires as many as two instructions per clock 


e Separate on-chip instruction and data caches (Harvard architecture) 


— 32-Kbyte, eight-way set-associative instruction and data caches 


— Pseudo least-recently-used (PLRU) replacement algorithm 
— 32-byte (eight-word) cache block 
— Physically indexed/physical tags 


— Cache write-back or write-through operation programmable on a per-page or 
per-block basis 


— Instruction cache can provide four instructions per clock; data cache can provide 
two words per clock 


— Caches can be disabled in software 


— Caches can be locked six of eight ways or the entire cache can be locked in 
software 


— Data cache coherency (MEJ) maintained in hardware 
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— The critical double word is made available to the requesting unit when it is burst 
into the line-fill buffer. The cache is nonblocking, so it can be accessed during 
this operation. 


¢ Level 2 (L2) cache interface (the L2 cache interface is not supported in the MPC745) 
— On-chip two-way set-associative L2 cache controller and tags 
— External data SRAMs 
— Support for 256-Kbyte, 512-Kbyte, and 1-Mbyte L2 caches 
— 64-byte (256-Kbyte/512-Kbyte) and 128-byte (1-Mbyte) sectored line size 


— Supports flow-through (register-buffer), both PB2 and PB3 pipelined 
(register-register), and pipelined late-write (register-register) synchronous burst 
SRAMs 


e Separate memory management units (MMUs) for instructions and data 
— 52-bit virtual address; 32-bit physical address 


— Address translation for 4-Kbyte pages, variable-sized blocks, and 256-Mbyte 
segments 


— Memory programmable as write-back/write-through, cacheable/noncacheable, 
and coherency enforced/coherency not enforced on a page or block basis 


— Separate IBATs and DBATs (selectable four or eight each) also defined as SPRs 
— Separate instruction and data translation lookaside buffers (TLBs) 


Both TLBs are 128-entry, two-way set-associative, and use PLRU 
replacement algorithm 


— TLBs are reloaded by the hardware or optionally, by software 


e Separate bus interface units for system memory and for the L2 cache 


— Bus interface features include the following: 


Selectable bus-to-core clock frequency ratios as described in the MPC755 
Hardware Specification 


32/64-bit, split-transaction external data bus with burst transfers with 32-bit 
mode selectable at reset 


Support for address pipelining and limited out-of-order bus transactions 
Single-entry load queue 

Single-entry instruction fetch queue 

Two-entry L1 cache castout queue 


No-DRTRY mode eliminates the DRTRY signal from the qualified bus grant. 
This allows the forwarding of data during load operations to the internal core 
one bus cycle sooner than if the use of DRTRY is enabled. 








— L2cache interface features (which are not implemented on the MPC745) include 
the following: 
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— Core-to-L2 frequency divisors as described in the MPC755 Hardware 
Specification 

— Four-entry L2 cache castout queue in L2 cache BIU 

— 17-bit address bus 

— 64-bit data bus 

— 8-bit parity for address and data 


— Private memory mode, allowing software to access L2 SRAM as private 
memory space 


Multiprocessing support features include the following: 
— Hardware-enforced, three-state cache coherency protocol (MEI) for data cache 


— Load/store with reservation instruction pair for atomic memory references, 
semaphores, and other multiprocessor operations 
Power and thermal management 
— Three static modes (doze, nap, and sleep) progressively reduce power 
dissipation: 
— Doze—All the functional units are disabled except for the time 
base/decrementer registers and the bus snooping logic. 
— Nap—tThe nap mode further reduces power consumption by disabling all 
functional units, disabling snooping, and leaving only the time base register 


and the PLL in a powered state. If snooping is required, the QACK input 
signal can be negated to wake up the processor and snooping logic. 





— Sleep—All internal functional units are disabled, after which external system 
logic may disable the PLL and SYSCLK. 


— Thermal management facility provides software-controllable thermal 
management. Thermal management is performed through the use of three 
supervisor-level registers and an MPC755-specific thermal management 
exception. 

— Instruction cache throttling provides control of instruction fetching to limit 
power consumption. 

Performance monitor can be used to help debug system designs and improve 

software efficiency. 


In-system testability and debugging features through JTAG boundary-scan 
capability 
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C.4 The MPC755 Programming Model (Chapter 2) 


This section describes the differences between the programming model of the MPC750 and 
MPC755. For detailed information about architecture-defined features, see the 
Programming Environments Manual. This section is organized as follows: 


¢ Section C.4.1, “MPC755-Specific Registers,” 
e Section C.4.2, “MPC750 and MPC755 Instruction Use,’ and 
e Section C.4.3, “tlbld and tlbli Instructions.” 


Figure C-2 shows the registers implemented in the MPC755, indicating those that are 
defined by the PowerPC architecture and those that are MPC755-specific. 
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C.4.1_ MPC755-Specific Registers 


The MPC755 processor programming model is functionally identical to that of the 
MPC750 except for some differences in the PVR (described in Section C.4.1.2, “Processor 
Version Register (PVR)’) and the L2CR (described in Section C.11.4.1, “L2 Cache Control 
Register (L2CR)’). Additionally, the following special-purpose registers are added in the 
MPC755 that are not defined by the PowerPC architecture: 


Special-purpose registers used for general purpose (SPRG[4—7])—Four additional 
SPRG registers have been implemented to assist in searching the page tables in 
software. This is a replacement for having the MSR[TGPR] bit of the MPC603e and 
four temporary general purpose registers. Note that the MSR[TGPR] bit is not 
implemented in the MPC755. If software table searching is not enabled, then these 
registers may be used for any supervisor purpose. The format of these registers 1s the 
same as that of SPRG[0-3] defined in Chapter 2, “Programming Model.” 


Hardware implementation-dependent register 2 (HID2)—This register, which is not 
implemented in the MPC750, is used to enable L2 address parity, software table 
search operations, IBAT[4—7] and DBAT[4—7], and instruction and data cache way 
locking. This register is described in Section C.4.1.3, “Hardware 
Implementation-Dependent Register 2 (HID2).” 


Instruction and data block address translation entries (IBAT[4—7] and DBAT[4—7]) 
which are optionally enabled in HID2—BATs are software-controlled arrays that 
store the available block address translations on-chip. BAT array entries are 
implemented as pairs of BAT registers that are accessible as supervisor 
special-purpose registers (SPRs). Four additional IBATs and four additional DBATs 
array entries provide a mechanism for translating additional blocks as large as 

256 Mbytes from the 32-bit effective address space into the physical memory space. 
This can be used for translating large address ranges whose mappings do not change 
frequently. The format of these registers is the same as that of IBAT[0-3] and 
DBAT[0-3] defined in Chapter 2, “Programming Model.” The SPR numbers for 
accessing these registers are outlined in Table C-2. 


The software table search registers are as follows (see C.7, “MPC755 Memory 
Management (Chapter 5),” for more detailed information): 


— Data and instruction TLB miss registers (DMISS and IMISS)—The DMISS and 
IMISS registers contain the effective page address of the access that caused the 
TLB miss exception. The contents are used by the MPC755 when calculating the 
values of HASH1 and HASH2, and by the tlbld and tlbli instructions when 
loading a new TLB entry. 

— Data and instruction TLB compare registers (DCMP and ICMP)—These 
registers contain the first word in the required page table entry (PTE). The 
contents are constructed automatically from the contents of the segment registers 
and the effective address (DMISS or IMISS) when a TLB miss exception occurs. 
Each PTE read from the tables during the table search process should be 
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compared with this value to determine whether or not the PTE is a match. Upon 
execution of a tlbld or tibli instruction the upper 25 bits of the DCMP or ICMP 
register and 11 bits of the effective address operand are loaded into the first word 
of the selected TLB entry. 


— Primary and secondary hash address registers (HASH1 and HASH2)—These 
registers contain the physical addresses of the primary and secondary page table 
entry groups (PTEGs) for the access that caused the TLB miss exception. For 
convenience, the MPC755 automatically constructs the full physical address by 
routing bits 0-6 of SDR1 into HASH] and HASH2 and clearing the lower 6 bits. 
These registers are read-only and are constructed from the contents of the 
DMISS or IMISS register (the register choice is determined by which miss was 
last acknowledged). 


— Required physical address register (RPA)—During a page table search 
operation, the software must load the RPA with the second word of the correct 
PTE. When the tlbld or tlbli instruction is executed, the contents of the RPA 
register and the DMISS or IMISS register are merged and loaded into the 
selected TLB entry. The referenced (R) bit is ignored when the write occurs (no 
location exists in the TLB entry for this bit). The RPA register is read and write 
to the software. 

¢ L2 private memory control register (L2PM)—The L2 cache private memory control 

register allows a portion of the physical address space to be directly mapped into a 

portion of the L2 SRAM. It is a supervisor-only, read/write, implementation-specific 

special purpose register (SPR) which is accessed as SPR 1016 (decimal). The L2PM 
is initialized to all Os during power-on reset and is described more completely in 

Section C.11.4.2, “L2 Private Memory Control Register (L2PM).” 


C.4.1.1. The MPC755 Additional SPR Encodings 


Table C-2 describes the encodings of the MPC755 register set additions described in this 
section. 


Table C-2. Additional SPR Encodings 


























SPR 
Register Access 
Decimal SPR[5-9] SPR[0-4] 
276 01000 10100 SPRG4 Supervisor 
277 01000 10101 SPRG5 Supervisor 
278 01000 10110 SPRG6 Supervisor 
279 01000 10111 SPRG7 Supervisor 
560 10001 10000 IBAT4U Supervisor 
561 10001 10001 IBAT4L Supervisor 
562 10001 10010 IBAT5U Supervisor 
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Table C-2. Additional SPR Encodings (continued) 






























































SPR 
Register Access 
Decimal SPR[5-9] SPR[0-4] 
563 10001 10011 IBAT5L Supervisor 
564 10001 10100 IBAT6U Supervisor 
565 10001 10101 IBAT6L Supervisor 
566 10001 10110 IBAT7U Supervisor 
567 10001 10111 IBAT7L Supervisor 
568 10001 11000 DBAT4U Supervisor 
569 10001 11001 DBAT4L Supervisor 
570 10001 11010 DBAT5U Supervisor 
571 10001 11011 DBAT5L Supervisor 
572 10001 11100 DBAT6U Supervisor 
573 10001 11101 DBAT6L Supervisor 
574 10001 11110 DBAT7U Supervisor 
575 10001 11111 DBAT7L Supervisor 
976 11110 10000 DMISS Supervisor 
977 11110 10001 DCMP Supervisor 
978 11110 10010 HASH1 Supervisor 
979 11110 10011 HASH2 Supervisor 
980 11110 10100 IMISS Supervisor 
981 11110 10101 ICMP Supervisor 
982 11110 10110 RPA Supervisor 
1011 11111 10011 HID2 Supervisor 
1016 11111 11000 L2PM Supervisor 




















C.4.1.2 Processor Version Register (PVR) 


The processor version register (PVR) is a 32-bit, read-only register present in the MPC750 
but initialized to a different value. It contains a value identifying the specific version 
(model) and revision level of the processor (see Figure C-3). The contents of the PVR can 
be copied to a GPR by the mfspr instruction. Read access to the PVR is supervisor-level 
only; write access is not provided. 





Version Revision 














0 15 16 31 
Figure C-3. Processor Version Register (PVR) 
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The PVR consists of two 16-bit fields: 


¢ Version (bits 0Q-15)—A 16-bit number that uniquely identifies a particular processor 
version. This number can be used to determine the version of a processor; it may not 
distinguish between different end product models if more than one model uses the 
same processor. 


¢ Revision (bits 16—31)—A 16-bit number that distinguishes between various releases 
of a particular version (that is, an engineering change level). The value of the 
revision portion of the PVR is implementation-specific. The processor revision level 
is changed for each revision of the device. 


Software can distinguish between the MPC750 and the MPC755 by reading the PVR. The 
MPC755 PVR reads as 0x0008_3100. The version is 0x0008 and the revision level starts at 
0x3100. 


C.4.1.3. Hardware Implementation-Dependent Register 2 (HID2) 


The MPC755 implements an additional hardware implementation-dependent register not 
described in Chapter 2, “Programming Model,’ shown in Figure C-4. It is a 
supervisor-only, read/write, implementation-specific special purpose register (SPR) which 
is accessed as SPR 1011 (decimal). 
























































HIGH_BAT_EN Reserved 
SWT_EN IWLCK[0-2] | DWLCK[0-2] —— 
L2AP_EN —— | 
die 0) 0/50 07.0 Oe0 0) 00d 0 0 Gopomderueed 0000 0| 
0 10 11 12 13 14 15 16 18 19 23 24 26 27 31 


Figure C-4. Hardware Implementation-Dependent Register 2 (HID2) 


Table C-3 describes the HID2 fields. 


Table C-3. Hardware Implementation Dependent Register 2 (HID2) 
Field Descriptions 











Bits Name Description 
0-10 — Reserved 
11 L2AP_EN L2 address parity enable. When this bit is set, some of the L2 address signals are used in 


the parity generated on L2DP[0:7]. See Section C.11.5, “L2 Address and Data Parity 
Signals,” for the combinations supported. 


12 SWT_EN Software table search enable. Setting this bit causes one of three new exceptions when a 
TLB miss occurs. See C.6, “MPC755 Exceptions (Chapter 4),” and C.7, “MPC755 Memory 
Management (Chapter 5),” for more information on the use of software table search 
operations. 





13 HIGH_BAT_EN | IBAT[4—7] and DBAT[4—7] enable. When this bit is set, four more IBAT and DBAT entries 
are available for translating blocks of memory. See C.4, “The MPC755 Programming 
Model (Chapter 2),” for more information on the SPR numbers used for accessing the new 
BATs. 
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Table C-3. Hardware Implementation Dependent Register 2 (HID2) 
Field Descriptions (continued) 





Bits Name Description 
14-15 — Reserved 
16-18 IWLCK[0-2] | Instruction cache way lock. Useful for locking blocks of instructions into the instruction 


cache for time-critical applications that require deterministic behavior. See 
Section C.5.2.3, “Performing Data and Instruction Cache Locking.” 

000 = no ways locked 

001 = way0 locked 

010 = way0 thru way1 locked 

011 = way0 thru way2 locked 

100 = way0 thru way3 locked 

101 = way0 thru way4 locked 

110 = way0 thru way5 locked 

111 = Reserved 





19-23 —_— Reserved 





24-26 DWLCK[0—2] | Data cache way lock. Useful for locking blocks of data into the data cache for time-critical 
applications where deterministic behavior is required. See Section C.5.2.3, “Performing 
Data and Instruction Cache Locking.” 

000 = no ways locked 

001 = way0 locked 

010 = way0 thru way1 locked 

011 = way0 thru way2 locked 

100 = way0 thru way3 locked 

101 = way0 thru way4 locked 

110 = way0 thru way5 locked 

111 = Reserved 














27-31 —_— Reserved 





C.4.2 MPC750 and MPC755 Instruction Use 


This section describes some restrictions of the stdf, mtsr, and mtsrin instructions on both 
the MPC750 and MPC755. In addition, the dcbz instruction has cache coherency 
implications described in Section C.5.1.2, “dcbz and L1 Cache Coherency.” 


C.4.2.1  stfd Instruction Use 


The MPC750 and MPC755 require that the FPRs be initialized with floating-point values 
before the stfd instruction is used. Otherwise, a random power-on value for an FPR may 
cause unpredictable device behavior when the stfd instruction is executed. Note that any 
floating-point value loaded into the FPRs is acceptable. 


C.4.2.2 isync Instruction Use with mtsr and mtsrin 


The MPC750 and MPC755 have a restriction on the use of the mtsr and mtsrin instructions 
not described in the Programming Environments Manual or in Chapter 2, “Programming 
Model.” The MPC750 and MPC755 require that an isyne instruction be executed after 
either an mtsr or mtsrin instruction. This isyne instruction must occur after the execution 
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of the mtsr or mtsrin and before the data address translation mechanism uses any of the 
on-chip segment registers. 


C.4.3  tlblid and tibli Instructions 


This section provides a detailed description of the two implementation-specific instructions 
used for software table search operations—tlbld and tlbli (same as the MPC603e). 


The address translation mechanism is defined in terms of segment descriptors and page 
table entries (PTEs) used by processors that implement the PowerPC architecture to locate 
the effective-to-physical address mapping for a particular access. The PTEs reside in page 
tables in memory. As defined for 32-bit implementations by the PowerPC architecture, 
segment descriptors reside in 16 on-chip segment registers. 


Similar to the MPC603e, the MPC755 provides two implementation-specific instructions 
(tlbld and tlbli) that are used by software table search operations following TLB misses to 
load TLB entries on-chip (not provided by the MPC750 because the MPC750 does not 
support software table search operations). 


Refer to C.7, “MPC755 Memory Management (Chapter 5),” for more information about 
the TLB registers and software table search operations with the MPC755. Table C-4 lists 
the TLB instructions implemented in the MPC755. 


Table C-4. Translation Lookaside Buffer Management Instructions 


























Name Mnemonic Operand Syntax 
TLB Invalidate Entry tlbie rB 
TLB Synchronize tlbsync — 
Load Data TLB Entry tlbld rB 
Load Instruction TLB Entry tlbli rB 





Because the presence and exact semantics of the translation lookaside buffer management 
instructions are implementation-dependent, system software should incorporate uses of the 
instructions into subroutines to maximize compatibility with programs written for other 
processors. 


For more information on the PowerPC instruction set, refer to Chapter 4, “Addressing 
Modes and Instruction Set Summary,’ and Chapter 8, “Instruction Set,’ in the 
Programming Environments Manual. 
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tlbid tlbid 























Load Data TLB Entry Integer Unit 
tibld rB 
[_] Reserved 
31 00000 00000 B 978 0 
0 5 6 10 11 15 16 20 21 30 31 
EA< (rB) 


TLB entry created from DCMP and RPA 
DTLB entry selected by EA[15-19] and SRR1I[WAY] < created TLB entry 








The EA is the contents of rB. The tlbld instruction loads the contents of the data PTE 
compare (DCMP) and required physical address (RPA) registers into the first word of the 
selected data TLB entry. The specific DTLB entry to be loaded is selected by the EA and 
the SRR1[WAY] bit. 


The tlbld instruction should only be executed when address translation is disabled 
(MSR[IR] = 0 and MSR[DR] = 0). 


Note that it is possible to execute the tlbld instruction when address translation is enabled; 
however, extreme caution should be used in doing so. If data address translation is enabled 
(MSR[DR] = 1) tlbld must be preceded by a sync instruction and succeeded by a context 
synchronizing instruction. 


Note also that care should be taken to avoid modification of the instruction TLB entries that 
translate current instruction prefetch addresses. 


This is a supervisor-level instruction; it is also a MPC755-specific instruction, and not part 
of the PowerPC instruction set. 


Other registers altered: 


¢ None 
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tlbli tlbli 























Load Instruction TLB Entry Integer Unit 
tibli rB 
[_] Reserved 
31 00000 00000 | B 1010 0 
0 5 6 10 11 15 16 20 21 30 31 
EA< (rB) 





TLB entry created from ICMP and RPA 
ITLB entry selected by EA[15-19] and SRR1L[WAY] < created TLB entry 





The EA is the contents of rB. The tlbli instruction loads the contents of the instruction PTE 
compare (ICMP) and required physical address (RPA) registers into the first word of the 
selected instruction TLB entry. The specific ITLB entry to be loaded is selected by the EA 
and the SRR1I[WAY] bit. 


The tlbli instruction should only be executed when address translation is disabled 
(MSR[IR] = 0 and MSR[DR] = 0). 


Note that it is possible to execute the tlbli instruction when address translation is enabled; 
however, extreme caution should be used in doing so. If instruction address translation is 
enabled (MSR[IR] = 1), tlbli must be followed by a context synchronizing instruction such 
as isync or rfi. 


Note also that care should be taken to avoid modification of the instruction TLB entries that 
translate current instruction prefetch addresses. 


This is a supervisor-level instruction; it is also a MPC755-specific instruction, and not part 
of the PowerPC instruction set. 
Other registers altered: 


¢ None 


C.5 MPC755L1 Instruction and Data Cache Operation 
(Chapter 3) 


This section describes L1 cache coherency issues and also describes the new instruction and 
data cache way locking features of the MPC755 embedded processor. Otherwise, the L1 
instruction and data cache operation is the same as the MPC750. 


The MPC755 includes a mechanism for allocating cache entries for a particular group of 
ways for both the instruction and data caches. If a way is locked, the data loaded in that 
cache way will not be replaced by an access to another address; that is, none of the entries 
in a locked cache way are re-allocated. One to six of the eight ways in a cache can be locked 
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with the IWLCK and DWLCK bits of the HID2 register. All eight ways of a cache can be 
locked using the ILOCK or DLOCK bits of the HIDO register. 


Note that integrated devices based on the MPC603e G2 processor core may also implement 
entire and cache way locking. However, the G2-based processor caches are only four-way 
set-associative, so only up to three ways can be locked. Additionally, the bit encodings in 
HID2 for enabling way-locking differ from the encodings used in the MPC755 and they do 
not correspond. Even though the G2 core processors also define similar IWLCK[0—2] and 
DWLCK[0-2] fields in HID2, the encodings are distinctly different. 


C.5.1 L1 Cache Coherency 


This section describes some L1 coherency precautions for the MPC755 in addition to that 
described in Chapter 3, “L1 Instruction and Data Cache Operation.” 


C.5.1.1 Coherency Precautions in Single Processor Systems 


Note that as described in Chapter 3, “L1 Instruction and Data Cache Operation,” great care 
must be taken when the WIMG bits are changed in the MMU. The following coherency 
paradoxes can be encountered within a single-processor system: 


¢ Load or store to a caching-inhibited page (WIMG = x1 xx) and a cache hit occurs. 


The MPC755 ignores any hits to an L1 cache block in a memory space marked 
caching-inhibited (WIMG = x1xx). The L1 cache is bypassed and the access is 
performed externally as if there were no hit. The data in the cache is not pushed, and 
the cache block is not invalidated. 


This operation is similar to that of the MPC750 except that in the case of the 
MPC750, the access is performed to the 60x bus. In the case of the MPC755, the 
access is performed to the private memory space if private memory is enabled, and 
if the upper order address bits match the value in LZPM[PMBA]. Alternatively, the 
access may hit in the L2 cache if it was previously designated as cacheable but the 
WIMG bits were changed so that the access is cache-inhibited. Although the access 
may hit in the L2 (if the data was previously loaded when the WIMG bits were set 
to caching-allowed), the L2 cache does not allocate any new entries for 
caching-inhibited data. This L2 cache behavior is different than that of the MPC750 
for this case. 


e Store to a page marked write-through (WIMG = 1xxx) and a cache hit occurs to a 
modified cache block. 


The MPC750 and MPC755 work identically in this case and ignore the modified bit 
in the cache tag. The cache block is updated during the write-through operation but 
the block remains in the modified state (M). 


Note that when WIM bits are changed in the page tables or BAT registers, it is critical that 
the cache contents reflect the new WIM bit settings. For example, if a block or page that 
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had allowed caching becomes caching-inhibited, software should ensure that the 
appropriate cache blocks are flushed to memory and invalidated. 


C.5.1.2 dcbz and L1 Cache Coherency 


Both the MPC750 and MPC755 processors require protection in the use of the dcbz 
instruction in order to guarantee cache coherency in a multiprocessor system. Specifically, 
the debz instruction must be: 


e Either enveloped by high-level software synchronization protocols (such as 
semaphores), or 


e Preceded by execution of a dcbf instruction to the same address. 


One of these precautions must be taken in order to guarantee that there are no simultaneous 
cache hits from a debz instruction and a snoop to that address. If these two events occur 
simultaneously, stale data may occur, causing system failures. 


C.5.2 Cache Locking 


This section describes the cache locking and cache-way locking features of the MPC755. 


C.5.2.1 Cache Locking Terminology 


Cache locking is the ability to prevent some or all of a microprocessor’s instruction or data 
cache from being overwritten. Cache locking can be set for either an entire cache or for 
individual ways within the cache as follows: 


e Entire Cache Locking—When an entire cache is locked, data for read hits within the 
cache are supplied to the requesting unit in the same manner as hits from an 
unlocked cache. Similarly, writes that hit in the data cache are written to the cache 
in the same way as write hits to an unlocked cache. However, any access that misses 
in the cache is treated as a cache-inhibited access. Cache entries that are invalid at 
the time of locking remain invalid and inaccessible until the cache is unlocked. 
When the cache has been unlocked, all entries (including invalid entries) are 
available. Entire cache locking is inefficient if the number of instructions or the size 
of data to be locked is small compared to the cache size. 


e Way Locking—Locking only a portion of the cache is accomplished by locking 
ways within the cache. Locking always begins with the first way (way0) and is 
sequential, that is, locking ways 0, 1, and 2 is possible, but it is not possible to lock 
only way0 and way2. When using way locking, at least two ways must be left 
unlocked. The maximum number of lockable ways is six on the MPC755 embedded 
processor (wayQ—way5S). 

Unlike entire cache locking, invalid entries in a locked way are accessible and 
available for data replacement. As hits to the cache fill invalid entries within a locked 
way, the entries become valid and locked. This behavior differs from entire cache 
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locking in which invalid entries cannot be allocated. Unlocked ways of the cache 
behave normally. 
Table C-5 summaries the MPC755 cache organization. 
Table C-5. Cache Organization 


Instruction Cache Size | Data Cache Size | Associativity Block Size Way Size 























32 Kbyte 32 Kbyte 8-way 8 words 4 Kbyte 





C.5.2.2 Cache Locking Register Summary 


Table C-6 through Table C-8 outline the registers and bits used to perform cache locking on 
the MPC755 embedded processor. Refer to Chapter 2, “Programming Model,” for a 
complete description of the HIDO and MSR registers. Refer to C.4, “The MPC755 
Programming Model (Chapter 2),” for a complete description of the HID2 register. 


Table C-6. HIDO Bits Used to Perform Cache Locking 





Bits | Name Description 





16 ICE Instruction cache enable. This bit must be set for instruction cache locking. See Section C.5.2.3.1, 
“Enabling the Data Cache.” 





17 DCE _ | Data cache enable. This bit must be set for data cache locking. See Section C.5.2.3.1, “Enabling the 
Data Cache.” 





18 ILOCK | Instruction cache lock. Set to lock the entire instruction cache. See Section C.5.2.3.14, “Entire 
Instruction Cache Locking.” 





19 | DLOCK | Data cache lock. Set to lock the entire data cache. See Section C.5.2.3.6, “Entire Data Cache 
Locking.” 





20 ICFI | Instruction cache ash in validate. Setting and then clearing this bit invalidates the entire instruction 
cache. See Section C.5.2.3.16, “Invalidating the Instruction Cache (Even if Locked).” 





21 DCFI_ |Data cache ash in validate. Setting and then clearing this bit invalidates the entire data cache. See 
Section C.5.2.3.4, “Invalidating the Data Cache.” 





22 SPD_ | Speculative cache access disable. This bit must be cleared for instruction cache locking. See 
Section C.5.2.3.13, “MPC755 Prefetching Considerations.” 





25 DCFA | Data cache ush assist. This bit must be set for data cache ushing. See Section C.5.2.3.4, 
“Invalidating the Data Cache.” 





29 BHT | Branch history table enable. This bit must be cleared for instruction cache locking. See 
Section C.5.2.3.13, “MPC755 Prefetching Considerations.” 











Table C-7. HID2 Bits Used to Perform Cache Locking 


Bits Name Description 





16-18 IWLCK Instruction cache way lock. These bits are used to lock individual ways in the instruction cache. 
See Section C.5.2.3.15, “Instruction Cache Way Locking.” 





24-26 DWLCK __ | Data cache way lock. These bits are used to lock individual ways in the data cache. See 
Section C.5.2.3.7, “Data Cache Way Locking.” 
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Table C-8. MSR Bits Used to Perform Cache Locking 


Bits Name Description 





16 EE External interrupt enable. This bit must be cleared during instruction and data cache loading. 
See Section C.5.2.3.3, “Disabling Exceptions for Data Cache Locking.” 





19 ME Machine check enable. This bit must be cleared during instruction and data cache loading. See 
Section C.5.2.3.3, “Disabling Exceptions for Data Cache Locking.” 





26 IR Instruction address translation. This bit must be set to enable instruction address translation 
by the MMU. See Section C.5.2.3.2, “Address Translation for Data Cache Locking.” 





27 DR Data address translation. This bit must be set to enable data address translation by the MMU. 
See Section C.5.2.3.2, “Address Translation for Data Cache Locking.” 

















C.5.2.3 Performing Data and Instruction Cache Locking 


This section outlines the basic procedures for locking the data and instruction caches and 
provides some example code for locking the caches. The procedures for the data cache are 
described first, followed by the corresponding sections for locking the instruction cache. 
The basic procedures for cache locking are: 

e Enabling the cache 

e Enabling address translation for example code 

¢ Disabling exceptions 

¢ Loading the cache 

¢ Locking the cache (entire cache locking or cache way locking) 


In addition, this section describes how to invalidate the data and instruction caches, even 
when they are locked. 


The following sections describe the procedures for performing data cache locking on the 
MPC755. 
C.5.2.3.1 Enabling the Data Cache 


To lock the data cache, the data cache enable bit HIDO[DCE], bit 17, must be set. The 
assembly code below enables the data cache: 





# Enable the data cache. This corresponds 
# to setting DCE bit in HIDO (bit 17) 








mfspr rl, HIDO 
ori rl, rl, 0x4000 
sync 


mtspr HIDO, rl 
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C.5.2.3.2 Address Translation for Data Cache Locking 


Two distinct memory areas must be set up to enable cache locking: 


e The first area is where the code that performs the locking resides and is executed 
from. 


e The second area is where the data to be locked resides. 


Both areas of memory must be in locations that are translated by the memory management 
unit (MMU). This translation can be performed either with the page table or the block 
address translation (BAT) registers. 


For the purposes of the cache locking example in this document, the two areas of memory 
are defined using the BAT registers. The first area is a 1-Mbyte area in the upper region of 
memory that contains the code performing the cache locking. The second area is a 
256-Mbyte block of memory (not all of the 256-Mbytes of memory is locked in the cache; 
this area is set up as an example) that contains the data to lock. Both memory areas use 
identity translation (the logical memory address equals the physical memory address). 


Table C-9 summarizes the BAT settings used in this example. 


Table C-9. Example BAT Settings for Cache Locking 











Area Base Address Memory Size WIMG Bits BATU Setting BATL Setting 
First OxFFFO_0000 1 Mbyte 0b01001 OxFFFO_001F | OxFFFO_0002 ' 
Second 0x0000_0000 256 Mbyte Ob0000 0x0000_1FFF | 0x0000_0002 


























1 Cache-inhibited memory is not a requirement for data cache locking. A setting of OxFFFO_0002 with a 
corresponding WIMG of 0b0000 marks the memory area as cacheable. 


The block address translation upper (BATU) and block address translation lower (BATL) 
settings in Table C-9 can be used for both instruction block address translation (IBAT) and 
data block address translation (DBAT) registers. After the BAT registers have been set up, 
the MMU must be enabled. The assembly code below enables both instruction and data 
memory address translation: 





# Enable instruction and data memory address translation. This 
# corresponds to setting IR and DR in the MSR (bits 26 & 27) 


mfmsr rl 

ori pt 1 OK0-030 
mtmsr a 

sync 


C.5.2.3.3 Disabling Exceptions for Data Cache Locking 


To ensure that exception handler routines do not execute while the cache is being loaded 
(which could possibly pollute the cache with undesired contents) all exceptions must be 
disabled. This is accomplished by clearing the appropriate bits in the machine state register 
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(MSR). See Table C-10 for the bits within the MSR that must be cleared to ensure that 
exceptions are disabled. 


Table C-10. MSR Bits for Disabling Exceptions 





























Bit Name Description 
16 EE External interrupt enable 

19 ME Machine check enable 

20 FEO 1 Floating-point exception mode 0 

23 FE1! Floating-point exception mode 1 





1 The oating-point e xception may not need to be disabled because the example code shown 
in this document that performs cache locking does not execute any oating-point oper ations. 


The following assembly code disables all asynchronous exceptions: 




















# Clear the following bits from the MSR: 
# EE (16) ME (19) 

# FEO (20) FEL (23) 

mfmsr rh 

lis r2, OXFFFF 

ori r2, r2, Ox66FF 

and Aly es Se) 

mtmsr rl 

sync 


C.5.2.3.4 Invalidating the Data Cache 


If a non-empty data cache has modified data, and the data cannot be discarded, the data 
cache must be flushed before it can be invalidated. Data cache flushing is accomplished by 
filling the data cache with known data and performing a flash invalidate or a series of dcebf 
instructions that force a flush and invalidation of the data cache block. 


The following code sequence shows how to flush the data cache: 


r6é contains a block-aligned address in memory with which to fill 
the data cache. For this example, address 0x0 is used 
li r6, 0x0 


CTR = number of data blocks to load 
Number of blocks = (16K) / (32 Bytes/block) 
= 2°14 / 2%5 = 2%9 = 0x200 








li rl, 0Ox200 
mtctr rl 


# Save the total number of blocks in cache to r8 
mr 3; 1 


# Load th ntire cache with known data 


loop: lwz r2, O(r6) 
addi r6, r6, 32 Find the next block 
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bdnz loop Decrement the counter, and 

# branch if CTR != 0 
# Now, flush the cache with dcbf instructions 

li r6, Ox0 Address of first block 

mtctr r8 Number of blocks 

loop2: 

dcbf rO, r6é 

addi r6, r6, 32 Find the next block 

bdnz loop2 Decrement the counter, and 

branch if CTR != 0 





If the content of the data cache does not need to be flushed to memory, the cache can be 
directly invalidated. The entire data cache is invalidated through the data cache flash 
invalidate bit HIDO[DCFI], bit 21. Setting HIDO[DCFI] and then immediately clearing it 
causes the entire data cache to be invalidated. The following assembly code invalidates the 
entire data cache (does not flush modified entries): 


# Set and then clear the HIDO[DCFI] bit, bit 21 
mfspr rl, HIDO 
mr r2, r1 
ori rl, rl, 0x0400 
mtspr HIDO, rl 
mtspr HIDO, r2 
sync 


C.5.2.3.5 Loading the Data Cache 


This section explains loading data into the data cache. The data cache can be loaded in 
several ways. The example in this document loads the data from memory. The following 
assembly code loads the data cache: 


# Assuming interrupts are turned off, cache has been flushed, 
# MMU on, and loading from contiguous cacheable memory. 
# r6 = Starting address of code to lock 
# r20 = Temporary register for loading into 
# CTR = Number of cache blocks to lock 
loop: lwz r20, O(4xr6) # Load data into d-cache 
addi r6, r6, 32 # Find next block to load 
bdnz loop # CTR = CTR-1, branch if CTR != 0 


C.5.2.3.6 Entire Data Cache Locking 


Locking of the entire data cache is controlled by the data cache lock bit (HIDO[DLOCK], 
bit 19). Setting HIDO[DLOCK] to 1 locks the entire data cache. To unlock the data, the 
HIDO[DLOCK] must be cleared to 0. Setting the DLOCK bit must be preceded by a syne 
instruction to prevent the data cache from being locked during a data access. The following 
assembly code locks the entire data cache: 


# Set the DLOCK bit in HIDO (bit 19) 
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mfspr rl, HIDO 

ori ri, ‘ri, OxL000 
sync 

mtspr HIDO, rl 


C.5.2.3.7 Data Cache Way Locking 


Data cache way locking is controlled by HID2[DWLCK], bits 24—26. Table C-11 shows the 
HID2[,DWLCK 0-2] settings for the MPC755 embedded processor. 


Table C-11. MPC755 DWLCK[0-2] Encodings 























DWLCK[0-2] Ways Locked 
0b000 No ways locked 
0b001 Way 0 locked 
0b010 Ways 0 and 1 locked 
0b011 Ways 0, 1, and 2 locked 
0b100 Ways 0, 1, 2, and 3 locked 
0b101 Ways 0, 1, 2, 3, and 4 locked 
0b110 Ways 0, 1, 2, 3, 4, and 5 locked 
0b111 Reserved 














The following assembly code locks way0 of the MPC755 data cache: 





Lock wayO of the data cache 
This corresponds to setting dwlck(0:2) Ob001 (bits 24-26) 











mfspr rl, HID2 


lis r2, OXFFFF 

ori r2, r2, OxFFIF 
and ted eA gf AZ 

ori rl, rl, 0x0020 
sync 


mtspr HID2, rl 


C.5.2.3.8 Invalidating the Data Cache (Even if Locked) 


There are two methods to invalidate the instruction or data cache: 
¢ Invalidate the entire cache by setting and then immediately clearing the data cache 
flash invalidate bit HIDO[DCFI], bit 21. Even when a cache is locked, toggling 
DCFI bit invalidates all of the data cache. 
e The data cache block invalidate (debi) instruction can be used to invalidate 


individual cache blocks. The debi instruction invalidates blocks locked (either entire 
or way-locked) within the data cache. 


The following sections describe the procedures for performing instruction cache locking on 
the MPC75S5. 
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C.5.2.3.9 Enabling the Instruction Cache 


To lock the instruction cache, the instruction cache enable bit HIDO[ICE], bit 16 must be 
set. 


# Enable the data cache. This corresponds 
# to setting DCE bit in HIDO (bit 17) 











mfspr rl, HIDO 

ori rl, rl, Ox8000 
sync 

mtspr HIDO, rl 


C.5.2.3.10 Address Translation for Instruction Cache Locking 


Two distinct memory areas must be set up to enable cache locking: 


e The first area is where the code that performs the locking resides and is executed 
from. 


e The second area is where the instructions to be locked reside. 


Both areas of memory must be in locations that are translated by the memory management 
unit (MMU). This translation can be performed either with the page table or the block 
address translation (BAT) registers. 


For the purposes of the cache locking example in this document, two areas of memory are 
defined using the BAT registers. The first area is a 1-Mbyte area in the upper region of 
memory that contains the code performing the cache locking. This area of memory must be 
cache-inhibited for instruction cache locking. The second area is a 256-Mbyte block of 
memory (not all of the 256-Mbytes of memory is locked in the cache; this area is set up as 
an example) that contains the instructions to lock. Both memory areas use identity 
translation (the logical memory address equals the physical memory address). Table C-12 
summarizes the BAT settings used in this example. 


Table C-12. Example BAT Settings for Cache Locking 











Area Base Address Memory Size WIMG Bits BATU Setting BATL Setting 
First OxFFFO_0000 1 Mbyte 0b0100!1 OxFFFO_001F | OxFFFO_0022 ' 
Second 0x0000_0000 256 Mbyte Ob0000 O0x0000_1FFF |0x0000_0002 


























1 OxFFFO_0022 de nes a cache-inhibited memor y area used for instruction cache locking, and corresponds 
to a WIMG of 0b0100. Cache-inhibited memory is not a requirement for data cache locking. A setting of 
OxFFFO_0002 with a corresponding WIMG of 0b0000 marks the memory area as cacheable. 


The block address translation upper (BATU) and block address translation lower (BATL) 
settings in Table C-12 can be used for both instruction block address translation (IBAT) and 
data block address translation (DBAT) registers. After the BAT registers have been set up, 
the MMU must be enabled. 
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The assembly code below enables both instruction and data memory address translation: 








Enable instruction and data memory address translation. This 
corresponds to setting IR and DR in the MSR (bits 26 & 27) 











mfmsr sag 

ori rl, rl, 0x0030 
mtmsr sant 

sync 


C.5.2.3.11 Disabling Exceptions for Instruction Cache Locking 


To ensure that exception handler routines do not execute while the cache is being loaded 
(which could possibly pollute the cache with undesired contents) all exceptions must be 
disabled. This is accomplished by clearing the appropriate bits in the machine state register 
(MSR). See Table C-13 for the bits within the MSR that must be cleared to ensure that 
exceptions are disabled. 


Table C-13. MSR Bits for Disabling Exceptions 

















Bit Name Description 
16 EE External interrupt enable 

19 ME Machine check enable 

20 FEO ' Floating-point exception mode 0 

23 FE1! Floating-point exception mode 1 


1 The oating-point exception may not need to be disabled because the example code shown 
in this document that performs cache locking does not execute any oating-point oper ations. 


The following assembly code disables all asynchronous exceptions: 




















# Clear the following bits from the MSR: 
# BE (16) ME (19) 
# FEO (20) FE1 (23) 

mfmsr rl 

lis r2, OXFFFF 

ori r2, r2, Ox66FF 

and ly. se; eZ 

mtmsr rl 

sync 


C.5.2.3.12 Preloading Instructions into the Instruction Cache 


To optimize performance, processors that implement the PowerPC architecture 
automatically prefetch instructions into the instruction cache. This feature can be used to 
preload explicit instructions into the cache even when it is known that their execution will 
be canceled. Although the execution of the instructions is canceled, the instructions remain 
valid in the instruction cache. 
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Because instructions are intentionally executed speculatively, care must be taken to ensure 
that all I/O memory is marked guarded. Otherwise, speculative loads and stores to I/O space 
could potentially cause data loss. See the Programming Environments Manual for a full 
discussion of guarded memory. 


The code that prefetches must be in cache-inhibited memory as in the following example: 


Assuming exceptions are disabled, cache has been flushed, 
the MMU is on, and we are executing in a cache-inhibited 
location in memory 

LR and r6 = Starting address of code to lock 

CTR = Number of cache blocks to lock 

r2 = non-zero numerator and denominator 

‘loop’ must begin on an 8-byte boundary to ensure that 
the divw and beqlr+ are fetched on the same cycle. 























-orig OxFFFO4000 


loop: divw. C2 25- £2 LONG divide w/ non-zero result 
beqlir+ Cause the prefetch to happen 
addi P67 LG, :32 Find next block to prefetch 
mtlr r6 set the next block 
bdnz- loop Decrement the counter and 
branch if CTR != 0 














In the above example, both the divw and beqIr+ instructions are fetched at the same time 
(this assumes a 64-bit 60x data bus; the preloading code does not work for a 32-bit data bus) 
due to their placement on a double-word boundary. The divide instruction was chosen 
because it takes many cycles to execute. During execution of the divide, the processor starts 
fetching instructions speculatively at the target destination of the branch instruction. The 
speculation occurs because the branch is statically predicted as taken. This speculative 
fetching causes the cache block that is pointed to by the link register (LR) to be loaded into 
the cache. Because the divw. instruction always produces a non-zero result, the beqIr+ is 
not taken and execution of all speculatively fetched instructions is canceled. However, the 
instructions remain valid in the cache. 


If the destination instruction stream contains an unconditional branch to another memory 
location, it is possible to also prefetch the destination of the unconditional branch 
instruction. This does not cause a problem if the destination of the unconditional branch is 
also inside the area of memory that needs to be preloaded. But if the destination of the 
unconditional branch is not in the area of memory to be loaded, then care must be taken to 
ensure that the branch destination is to an area of memory that is cache inhibited. 
Otherwise, unintentional instructions may be locked in the cache and the desired 
instructions may not be in their expected way within the cache. 
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C.5.2.3.13 MPC755 Prefetching Considerations 


Because the instruction cache preloading code relies on static branch prediction to ensure 
that the beqIr+ instruction is predicted as taken, speculative cache access must be enabled. 
Speculative cache access is controlled by the speculative cache access disable bit 
HIDO[SPD], bit 22. This bit must be cleared to ensure that instructions can be speculatively 
loaded from the instruction cache. 


Also, the instruction cache preloading code will not work when dynamic branch prediction 
is enabled. To ensure that MPC755 dynamic branch prediction is disabled, the branch 
history table bit HIDO[BHT], bit 29, must be cleared. By default, the BHT is cleared out of 
reset. 


C.5.2.3.14 Entire Instruction Cache Locking 


Locking the entire instruction cache is controlled by the instruction cache lock bit 
(HIDO[ILOCK], bit 18). Setting HIDO[ILOCK] locks the entire instruction cache, and 
clearing HIDO[ILOCK] allows the instruction cache to operate normally. The setting of the 
HIDO[ILOCK] should be preceded by an isync instruction to prevent the instruction cache 
from being locked during an instruction access. The following assembly code locks the 
contents of the entire instruction cache. 


# Set the ILOCK bit in HIDO (bit 18) 


mfspr rl, HIDO 

ori rl, rl, O0x2000 
isyne 

mtspr HIDO, rl 


C.5.2.3.15 Instruction Cache Way Locking 


Instruction cache way locking is controlled by the HID2[[WLCK], bits 16-18. Table C-14 
shows the HID2[IWLCK 0-2] settings for the MPC755 embedded processor. 


Table C-14. MPC755 IWLCK[0-2] Encodings 



































IWLCK [0-2] Ways Locked 
0b000 No ways locked 
0b001 Way 0 locked 
0b010 Ways 0 and 1 locked 
0b011 Ways 0, 1, and 2 locked 
0b100 Ways 0, 1, 2, and 3 locked 
0b101 Ways 0, 1, 2, 3, and 4 locked 
0b110 Ways 0, 1, 2, 3, 4, and 5 locked 
0b111 Reserved 
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The assembly code below locks way0 of the MPC755 instruction cache: 





Lock wayO of the instruction cache 
This corresponds to setting iwlck(0:2) to Ob001 (bits 16-18) 











mfspr rl, HID2 


lis r2, OXFFFF 

ori r2, r2, OxlFFF 
and ily <t,. M2 

ori rl, rl, 0x2000 
isyne 


mtspr HID2, rl 


C.5.2.3.16 Invalidating the Instruction Cache (Even if Locked) 


There are two methods to invalidate the instruction cache. In the first way, invalidate the 
entire cache by setting and then immediately clearing the instruction cache flash invalidate 
bit (HIDO[ICFI], bit 20). Even when a cache is locked, toggling the ICFI bit invalidates all 
of the instruction cache. The following assembly code invalidates the entire instruction 
cache: 


# Set and then clear the HIDO[ICFI] bit, bit 20 


mfspr rl, HIDO 
mr ¢2, C1 
ori rl, rl, 0x0800 


mtspr HIDO, rl 
mtspr HIDO, r2 
sync 


In the second method, the instruction cache block invalidate (icbi) instruction can be used 
to invalidate individual cache blocks. The icbi instruction invalidates blocks in an entirely 
locked instruction cache for the MPC750 and the MPC755 microprocessors. On the 
MPC755 embedded processor, the icbi instruction invalidates way-locked blocks within the 
instruction cache. 


C.6 MPC755 Exceptions (Chapter 4) 


The exception model for the MPC755 is the same as that described in Chapter 4, 
“Exceptions,” except as described in this section. For both the MPC750 and MPC755, no 
combination of the thermal assist unit, the decrementer register, and the performance 
monitor can be used at any one time. If exceptions for any two of these functional blocks 
are enabled together, multiple exceptions caused by any of these three blocks cause 
unpredictable results. 


The MPC755 has three new exceptions used to support software table search operations 
(the same as the MPC603e). Software table searching is enabled with the setting of 
HID2[SWT_EN], bit 12. When this bit is cleared, the MPC755 uses the hardware table 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


searching mechanism of the MPC750 when a miss occurs in an on-chip TLB. When 
HID2[SWT_EN] = 1, software table searching is enabled and a TLB miss causes one of the 


Freescale Semiconductor, Inc. 


MPC755 Exceptions (Chapter 4) 


exceptions described in this section. 


See Section C.4.3, “tlbld and tlbli Instructions,” for a detailed explanation of the tlbli and 
tlbld instructions used to load the TLBs. See C.7, “MPC755 Memory Management 
(Chapter 5),” for a more detailed explanation of the other resources used to perform table 


search operations in software and example exception handlers. 


The three MMU exceptions used for software table search operations are described in 














Table C-15. 
Table C-15. Software Table Search Exceptions and Conditions 
Exception Vector Offset Causing Conditions 
Type (hex) 

Instruction TLB 01000 An instruction TLB miss exception is caused when an effective address for an 
miss instruction fetch cannot be translated by the ITLB. 

Data TLB miss 01100 A data TLB miss for load exception is caused when an effective address for a data 
for load load operation cannot be translated by the DTLB. 

Data TLB miss 01200 A data TLB miss for store exception is caused when an effective address for a data 


for store 








store operation cannot be translated by the DTLB, or where a DTLB hit occurs, and 


the change bit in the PTE must be set due to a data store operation. 





The SRRO, SRR1, and MSR registers are used by the MPC755 when an exception occurs. 
Register settings for the instruction and data TLB miss exceptions are described in 


Table C-16. 


Table C-16. Instruction and Data TLB Miss Exceptions—Register Settings 











MSR 1 








Register Setting Description 
SRRO Set to the address of the next instruction to be executed in the program for which the TLB miss 
exception was generated. 
SRR1 0-3 Loaded from condition register CRO eld 
4-11 Cleared 
12 KEY. Key for TLB miss (either Ks or Kp from segment register, depending on whether the 
access is a user or supervisor access). See Figure C-5. 
13 D/I. Data or instruction access 
0 = data TLB miss 
1 = instruction TLB miss 
14 WAY. Next TLB set to be replaced (set per LRU) 
0 = replace TLB associativity set 0 
1 = replace TLB associativity set 1 
15 S/L. Store or load data access 


0 = data TLB miss on load 
1 = data TLB miss on store (or C = 0) 


16-31 Loaded from bits 16-31 of the MSR 





POW 0 EE 0 FEO 0 IR 0 
LE — PR 0 SE 0 DR 0O 
IP — FP 0 BE 0 RI 0 
ME — FE1 0 LE = Set to value of ILE 





Appendix C. MPC755 Embedded G3 Microprocessor 


For More Information On This Product, 
Go to: www.freescale.com 








Freescale Semiconductor, Inc. 
MPC755 Exceptions (Chapter 4) 


1 MSR[14] (the TGPR bit) of the MPC603e processor provided control for a separate set of four temporary GPRs 
that could be used as general-purpose registers in the TLB miss exception handler routines. MSR[14] is reserved 
on the MPC755, and the new SPRG[4—7] can be used for the TLB miss handler code. 


The MPC755 automatically saves the values of CR[CRO] of the executing context to 
SRR1[0-3]. Thus, the exception handler can set CR[CRO] bits and branch accordingly in 
the exception handler routine, without having to save the existing CR[CRO] bits. However, 
the exception handler must restore these bits to CR[CRO] before executing the rfi 
instruction. 


Also saved in SRRI1 are two bits identifying the type of miss (SRR1[D/I] identifies 
instruction or data, and SRR1[S/L] identifies a store or load). Additionally, SRRI[WAY ] 
identifies the associativity class of the TLB entry selected for replacement by the LRU 
algorithm. The software can change this value, effectively overriding the replacement 
algorithm. Finally, the SRR1[KEY] bit is used by the table search software to determine if 
there is a protection violation associated with the access (useful on data write misses for 
determining if the C bit should be updated in the table). 


The key bit, saved in SRR1 for a TLB miss exception, is derived as shown in Figure C-5. 


Select KEY from segment register: 
If MSR[PR] = 0, KEY = Ks 





If MSRIPR] = 1, KEY = Kp 


Figure C-5. Derivation of Key Bit for SRR1 


C.6.1. Instruction TLB Miss Exception (0x01000) 


When the effective address for an instruction fetch operation cannot be translated by the 
ITLBs or IBATs, an instruction TLB miss exception is generated. Register settings for the 
instruction and data TLB miss exceptions are described in Table C-16. If the instruction 
TLB miss exception handler fails to find the desired PTE, then a page fault must be 
synthesized. The handler must restore the machine state before invoking the ISI exception 
(0x00400). 


When an instruction TLB miss exception is taken, instruction execution for the handler 
begins at offset 0x01000 from the physical base address indicated by MSR[IP]. 


C.6.2 DataTLB Miss for Load Exception (0x01100) 


When the effective address for a data load or cache operation cannot be translated by the 
DTLBs or DBATs, a data TLB miss for load exception is generated. Register settings for 
the instruction and data TLB miss exceptions are described in Table C-16. If the data TLB 
miss exception handler fails to find the desired PTE, then a page fault must be synthesized. 
The handler must restore the machine state before invoking the DSI exception (0x00300). 
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When a data TLB miss for load exception is taken, instruction execution for the handler 
begins at offset 0x01100 from the physical base address indicated by MSR[IP]. 


C.6.3 Data TLB Miss for Store Exception (0x01200) 


When the effective address for a data store or cache operation cannot be translated by the 
DTLBs or DBATs, a data TLB miss for store exception is generated. The data TLB miss 
for store exception is also taken when the changed bit (C = 0) for a DTLB entry needs to 
be updated for a store operation. Register settings for the instruction and data TLB miss 
exceptions are described in Table C-16. 


If the data TLB miss exception handler fails to find the desired PTE, then a page fault must 
be synthesized. The handler must restore the machine state before invoking the DSI 
exception (0x00300). 


When a data TLB miss for store exception is taken, instruction execution for the handler 
begins at offset 0x01200 from the physical base address indicated by MSR[IP]. 


C.7 MPC755 Memory Management (Chapter 5) 


The MPC755 implements a virtual memory management scheme that is compliant with the 
PowerPC architecture for 32-bit microprocessors and that implements the software table 
searching features of the MPC603e. The organization of the memory management unit 
(MMU) hardware is as follows: 
e Same as MPC750 
— 128-entry, two-way set associative data TLB 
— 128-entry, two-way set associative instruction TLB 
— Sixteen segment registers 
— Automatic hardware table search operations 
¢ New features in the MPC755 
— 4- or 8-entry (HID2-selectable), fully associative instruction BAT array 
— 4- or 8-entry (HID2-selectable), fully associative data BAT array 


— Selectable software table search functionality by setting HID2[SWT_EN], 
bit 12. 


The MPC755 has a set of implementation-specific registers, exceptions, and instructions 
that facilitate very efficient software searching of the page tables in memory. This section 
describes those resources and provides three example code sequences that can be used in 
an MPC755 system for an efficient search of the translation tables in software. These three 
code sequences can be used as handlers for the three exceptions requiring access to the 
PTEs in the page tables in memory—instruction TLB miss, data TLB miss on load, and data 
TLB miss on store exceptions. 
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Note that the remainder of the MMU definition and rules about updating the page tables in 
memory for the MPC755 are the same as that for the MPC750. 


C.7.1 Software Table Search Resources 


In addition to setting up the translation page tables in memory, the system software must 
assist the processor in loading PTEs into the on-chip TLBs. When a required TLB entry is 
not found in the appropriate TLB, the processor vectors to one of the three TLB miss 
exception handlers so that the software can perform a table search operation and load the 
TLB. When this occurs, the processor automatically saves information about the access and 
the executing context. Table C-17 provides a summary of the implementation-specific 
exceptions, registers, and instructions that can be used by the TLB miss exception handler 
software in MPC755 systems. See Section C.4.3, “tlbld and tlbli Instructions,” for detailed 
information about the operation of the tlbli and tlbld instructions and C.6, “MPC755 
Exceptions (Chapter 4),” for more information about exception processing on the MPC755. 


Table C-17. Implementation-Specific Resources for Software Table Search 
Operations—Summary 








Resource Name Description 
Exceptions | Instruction TLB miss exception No matching entry found in ITLB 
(vector offset 0x1000) 


Data TLB miss for load exception | No matching entry found in DTLB for a load data access 
(vector offset 0x1100) 





Data TLB miss for store No matching entry found in DTLB for a store data access or 
exception—also caused when matching DLTB entry has C = 0 and access is a store 
changed bit must be updated 
(vector offset 0x1200) 


Registers |IMISS and DMISS When a TLB miss exception occurs, the IMISS or DMISS register 
contains the 32-bit effective address of the instruction or data 
access that caused the miss exception. 


ICMP and DCMP The ICMP and DCMP registers contain the word to be compared 
with the rst word of a PTE in the table search software routine to 
determine if a PTE contains the address translation for the 
instruction or data access. The contents of ICMP and DCMP are 
automatically derived by the MPC755 when a TLB miss exception 
occurs. 


HASH1 and HASH2 The HASH1 and HASHZ2 registers contain the primary and 
secondary PTEG addresses that correspond to the address 
causing a TLB miss. These PTEG addresses are automatically 
derived by the MPC755 by performing the primary and secondary 
hashing function on the contents of IMISS or DMISS, for an ITLB or 
DTLB miss exception, respectively. 














RPA The system software loads a TLB entry by loading the second word 
of the matching PTE entry into the RPA register and then executing 
the tlbli or tlbId instruction (for loading the ITLB or DTLB, 
respectively). 
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Table C-17. Implementation-Specific Resources for Software Table Search 
Operations—Summary (continued) 














Resource Name Description 
Instructions | tlbli rB Loads the contents of the ICMP and RPA registers into the ITLB 
entry selected by <ea> and SRR1[WAY] 
tlbId rB Loads the contents of the DCMP and RPA registers into the DTLB 
entry selected by <ea> and SRR1[WAY] 





In addition, the MPC755 contains four additional SPRG registers SPRG[4—7] that have 
been implemented to save and restore general-purpose registers used by the exception 
handler. This is a replacement for having the MSR[TGPR] bit of the MPC603e and four 
temporary general-purpose registers. Note that the MSR[TGPR] bit is not implemented in 
the MPC755. If software table searching is not enabled, then these registers may be used 
for any supervisor purpose. 


C.7.2 Software Table Search Registers 


This section describes the format of the implementation-specific SPRs that are not defined 
by the PowerPC architecture, but are used by the TLB miss exception handlers. These 
registers can be accessed by supervisor-level instructions only. Any attempt to access these 
SPRs with user-level instructions results in a privileged instruction program exception. 
Because DMISS, IMISS, DCMP, ICMP, HASH1, HASH2, and RPA are used to access the 
translation tables for software table search operations, they should only be accessed when 
address translation is disabled (that is, MSR[IR] = 0 and MSR[DR] = 0). Note that 
MSR[IR] and MSR[DR] are cleared by the processor whenever an exception occurs. 


C.7.2.1. Data and Instruction TLB Miss Address Registers (DMISS 
and IMISS) 


The DMISS and IMISS registers have the same format as shown in Figure C-6. They are 
loaded automatically upon a data or instruction TLB miss. The DMISS and IMISS contain 
the effective page address of the access that caused the TLB miss exception. The contents 
are used by the processor when calculating the values of HASH]! and HASH2, and by the 
tlbld and tlbli instructions when loading a new TLB entry. Note that the MPC755 always 
loads a big-endian address into the DMISS register. These registers are read-only to the 
software. 


Effective Page Address 


0 31 
Figure C-6. DMISS and IMISS Registers 
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C.7.2.2 Data and Instruction TLB Compare Registers (DCMP and 
ICMP) 


The DCMP and ICMP registers are shown in Figure C-7. These registers contain the first 
word in the required PTE. The contents are constructed automatically from the contents of 
the segment registers and the effective address (DMISS or IMISS) when a TLB miss 
exception occurs. Each PTE read from the tables in memory during the table search process 
should be compared with this value to determine whether or not the PTE is a match. Upon 
execution of a tlbld or tlbli instruction, the contents of the DCMP or ICMP register are 
loaded into the first word of the selected TLB entry. 


VSID API 


0 1 24 25 26 31 
Figure C-7. DCMP and ICMP Registers 


Table C-18 describes the bit settings for the DCMP and ICMP registers. 
Table C-18. DCMP and ICMP Bit Settings 





























Bit Name Description 
0 V Valid bit. Set by the processor on a TLB miss exception. 
1-24 VSID Virtual segment ID. Copied from VSID eld of corresponding segment register . 
25 H Hash function identi er. Cleared by the processor on a TLB miss exception. 
26-31 API Abbreviated page index. Copied from API of effective address. 





C.7.2.3. Primary and Secondary Hash Address Registers (HASH1 
and HASH2) 


HASH1 and HASH2 contain the physical addresses of the primary and secondary PTEGs 
for the access that caused the TLB miss exception. Only bits 7—25 differ between them. For 
convenience, the processor automatically constructs the full physical address by routing 
bits 0-6 of SDR1 into HASH1 and HASH2 and clearing the lower six bits. These registers 
are read-only and are constructed from the contents of the DMISS or IMISS register. The 
format for the HASH1 and HASH2 registers is shown in Figure C-8. 





Reserved 














HTABORG Hashed Page Address 000000 
0 6 7 25 26 31 


Figure C-8. HASH1 and HASH2 Registers 

















Table C-19 describes the bit settings of the HASH1 and HASH2 registers. 
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Table C-19. HASH1 and HASH2 Bit Settings 





Bit Name Description 





0-6 HTABORG[0-6] Copy of the upper 7 bits of the HTABORG eld from SDR1 





7-25 Hashed page address Address bits 7—25 of the PTEG to be searched. 











26-31 —_ Reserved 











C.7.2.4 Required Physical Address Register (RPA) 


The RPA is shown in Figure C-9. During a page table search operation, the software must 
load the RPA with the second word of the correct PTE. When the tlbld or tlbli instruction 
is executed, data from the IMISS and ICMP (or DMISS and DCMP) and the RPA registers 
is merged and loaded into the selected TLB entry. The TLB entry is selected by the effective 
address of the access (loaded by the table search software from the DMISS or IMISS 
register) and the SRR1I[WAY] bit. 

















Reserved 
RPN 000 R|C WIMG 0 | PP 
0 19 20 22 23 24 25 28 29 30 31 


Figure C-9. Required Physical Address (RPA) Register 


Table C-20 describes the bit settings of the RPA register. 
Table C-20. RPA Bit Settings 



































Bit Name Description 
0-19 RPN Physical page number from PTE 
20-22 |— Reserved 
23 R Referenced bit from PTE 
24 Cc Changed bit from PTE 
25-28 |WIMG Memory/cache access attribute bits 
29 — Reserved 
30-31 PP Page protection bits from PTE 





C.7.3. Software Table Search Operation 


When a TLB miss occurs and software table searching is enabled, the instruction or data 
MMU loads the IMISS or DMISS register, respectively, with the effective address of the 
access. The processor completes all instructions dispatched prior to the exception, status 
information is saved in SRR1, and one of the three TLB miss exceptions is taken. In 
addition, the processor loads the ICMP or DCMP register with the value to be compared 
with the first word of PTEs in the tables in memory. 
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The software should then access the first PTE at the address pointed to by HASH 1. The first 
word of the PTE should be loaded and compared to the contents of DCMP or ICMP. If there 
is a match, then the required PTE has been found and the second word of the PTE is loaded 
from memory into the RPA register. Then the tlbli or thbld instruction is executed, which 
loads the contents of the ICMP (or DCMP) and RPA registers into the selected TLB entry. 
The TLB entry is selected by the effective address of the access and the SRRI[WAY] bit. 


If the compare did not result in a match, however, the PTEG address is incremented to point 
to the next PTE in the table and the above sequence is repeated. If none of the eight PTEs 
in the primary PTEG matches, the sequence is then repeated using the secondary PTEG (at 
the address contained in HASH2). 


If the PTE is also not found in the eight entries of the secondary page table, a page fault 
condition exists, and a page fault exception must be synthesized. Thus the appropriate bits 
must be set in SRR1 (or DSISR) and the TLB miss handler must branch to either the ISI or 
DSI exception handler, which handles the page fault condition. 


C.7.3.1 Flow for Example Exception Handlers 


This section provides a flow diagram outlining some example software that can be used to 
handle the three TLB miss exceptions. 


Figure C-10 shows the flow for the example TLB miss exception handlers. The flow shown 
is common for the three exception handlers, except that the IMISS and ICMP registers are 
used for the instruction TLB miss exception while the DMISS and DCMP registers are used 
for the two data TLB miss exceptions. Also, for the cases of store instructions that cause 
either a TLB miss or require a table search operation to update the C bit, the flow shows 
that the C bit is set in both the TLB entry and the PTE in memory. Finally, in the case of a 
page fault (no PTE found in the table search operation), the setup for the ISI or DSI 
exception is slightly different for these two cases. 


Figure C-11 shows the flow for checking the R and C bits and setting them appropriately. 
Figure C-12 shows the flow for synthesizing a page fault exception when no PTE is found. 
Figure C-13 shows the flow for managing the cases of a TLB miss on an instruction access 
to guarded memory, and a TLB miss when C = 0 and a protection violation exists. The set 
up for these protection violation exceptions is very similar to that of page fault conditions 
(as shown in Figure C-12) except that different bits in SRR1 (and DSISR) are set. 
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Save old counter, 
CRO bits and 4 gprs 


Set Counter: 
cnt< 8 


Load Primary PTEG Pointer: 
ptr — HASH1 — 8 
compare_value < ICMP/DCMP 


Read Lower Word of Next 


PTE from Memory: 


temp < (ptr) 
—— cnt +0 


otherwise 


i 


compare_value [H] = 1 otherwise 





temp = compare_Value otherwise 








Read Upper Word of PTE: 
temp < (ptr - 4 


RPA < temp 





Secondary Hash 
Complete 


Set Up for Page 
Fault Exception 


(See Figure C-12) 


Set Up for Protection 
Violation Exception 


(See Figure C-13) 


Load Secondary 
PTEG Pointer: 
ptr — HASH2 — 8 


compare_value [H]< 1 


Set Counter: 
cnt<« 8 


instruction Access and 
temp[G] = 1 





otherwise 


<ea> < IMISS/DMISS 


Check R, C Bits 
and Set as Needed 







(See Figure C-11) 














Load TLB Entry 
tlbli <ea> (or tlbld<ea>) 


Restore Old Counter 
and CRO bits 


Return to Executing Program: 
rfi 


Figure C-10. Flow for Example Software Table Search Operation 
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Check R, C Bits 
and Set as Needed 


Handler for Data Store Op 



























otherwise 
temp[C] = 
Check 
Protection PP =~ ae otherwise Set R Bit: 
t temp < temp OR 0x100 
pp = 
= 10 
oe Store Byte 7 of PTE to Memory: 
op =11 (ptr - 2) < temp [byte7] 
Set Up for Return to TLB Miss 
Protection Violation Exception Flow 
(See Figure C-13) (See Figure C-10) 
SRR1 ssl =1 





otherwise Set Up for 


Protection Violation 


(See Figure C-13) 





Set R, C bits: 
temp <— temp OR 0x180 


Store Bytes 6, 7 of PTE to Memory: 
(ptr - 2) < temp [Bytes 6, 7] 


Return to TLB Miss 
Exception Flow 


(See Figure C-10) 


Figure C-11. Check and Set R and C Bit Flow 
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Set up for Page 
Fault Exception 







Data TLB Miss Handlers Instruction TLB 
Miss Handlers 


Clear Upper Bits of SRR1 
SRR1 < SRR1 AND OxFFFF 
SRRi[1] < 1 
Restore CRO Bits 
and gprs 


Branch to ISI Exception 
Handler 






DSISR[6] — SRR1[15] 


Clear Upper Bits of SRR1 
SRR1 <— SRR1 AND OxFFFF 
DSISR[1] < 1 
dtemp «+ DMISS 








SRR1[31] = 1 
(Little-Endian Mode) 


dtemp< dtemp XOR 0x07 


otherwise 


DAR < dtemp 
Restore CRO Bits 
and gprs 


Branch to DSI 
Exception Handler 








Figure C-12. Page Fault Setup Flow 
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Set up for Protection 
Violation Exceptions 






Data TLB Miss Handlers Instruction TLB 


Miss Handler (Instruction Access to 


Guarded Memory) 









(Data Access 


to Protected 
ee DSISR[6] — SRR1[15] 

























Memory; C=0) 
Clear Upper Bits of SRR1 Clear Upper Bits of SRR1 
SRR1 << SRR1 AND OxFFFF SRR1 <— SRR1 AND OxFFFF 
DSISR[4] < 1 SRR1[4] < 1 
diemp<— DMISS Restore CRO Bits 
and gprs 
SRR1[31] = 1 Branch to ISI Exception 





(Little-Endian Mode) Handler 







otherwise 






dtemp< dtemp XOR 0x07 


DAR < dtemp 
Restore CRO Bits 
and gprs 


Branch to DSI Exception 
Handler 






Figure C-13. Setup for Protection Violation Exceptions 


C.7.3.2 Code for Example Exception Handlers 


This section provides some assembly language examples that implement the flow diagrams 
described above. Note that although these routines fit into a few cache lines, they are 
supplied only as a functional example; they could be further optimized for faster 
performance. 

















TLB software load for MPC755 
New Instructions: 
tlbld — write the dtlb with the pte in rpa reg 
tlbli — write the itlb with the pte in rpa reg 
New SPRs 
dmiss - address of dstream miss 
imiss - address of istream miss 
hashl - address primary hash PTEG address 
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hash2 
iCmp 
dcmp 
rpa 
there are thr 
tlbDataMiss 
tlbCeq0d 
tlbInstrMiss 
+ 
-machine PPC_755 
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— returns secondary hash PTEG address 

—- returns the primary istream compare value 
—- returns the primary dstream compare value 
—- the second word of pte used by tlblx 





flows. 
-— tlb miss on data load 


- tlb miss on data store or store with tlb change bit == 


- tlb miss on instruction fetch 


place labels for rel branches 


gpr r0..r3 are saved into SPRG4-7 


n TB miss flow 


-> address of instruction that missed 

> 0:3=cr0O 4=lru way bit 16:31 = saved MSR 
-> ea that missed 

—> the compare value for the va that missed 
-> pointer to first hash pteg 





.set r0, O 
.set need Cee 
.set r2,. 2 
.set r3, 3 
.set dMiss, 1010 
-set dcmp, 1011 
-set hashl, 1012 
.set hash2, 1013 
.set iMiss, 1014 
.set icmp, 1015 
.set rpa, 1010 
.set c0, 0 
-set dar, 19 
.set dsisr, 18 
.set srr0, 26 
.set srrl, 27 
.set sprg4, 276 
.set sprg5, 277 
.set sprg6, 278 
-set sprg7, 279 
.csect tlbmiss[PR] 
vec0: 
-globl vec0 
.Org vec0+0x300 
vec300: 
.Org vec0+0x400 
vec400: 
+ 
Instructio 
Entry: 

Vec = 1000 

srrod 

srrl 

iMiss 

iCmp 

hashl 

hash2 








—-> pointer to second hash pteg 
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Register usage: 
Existing values of r0-r3 saved into sprg4-sprg7 
r0-r3 used in the exception handler as follows 



































































































































r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare value 
-Oorg vec0+0x1000 
tlbInstrMiss: 
mtspr sprg4, r0 save r0O into sprg4 
mtspr sprg5, rl save rl into sprg5 
mtspr sprg6, r2 save r2 into sprg6 
mtspr sprg7, x3 save r3 into sprg7 
mfspr r2, hashl get first pointer 
addi rl, 0, 8 load 8 for counter 
mfctr sal) save counter 
mfspr r3, iCmp get first compare value 
addi r2, £2, —8 pre dec the pointer 
im0: mtctr rl load counter 
iml: lwzu rl, 8(r2) get next pte 
cmp COG “tly. #3 see if found pte 
bdneq iml dec count br if cmp ne and if count not zero 
bne instrSecHash if not found set up second hash or exit 
aL, rl, +4(r2) load tlb entry lower-word 
andi. RS FLY 8 check G-bit 
bne doISIp if guarded, take an ISI 
mtctr r0 restore counter 
mfspr r0, iMiss get the miss address for the tlbli 
mfspr ES; Srxvl get the saved cr0 bits 
mtcrf 0x80, “r3 restore CRO 
mtspr rpa, rl set the pte 
ori rl, rl, 0Ox100 set reference bit 
srw rly ey 8 get byte 7 of pte 
tlibli r0 load the itlb 
stb rl, FOZ) update page table 
mfspr r0, sprg4 restore old value of r0 
mfspr rl, sprg5 restore old value of rl 
mfspr r2, sprgé restore old value of r2 
mfspr r3;  Sprg7 restore old value of r3 
GET return to executing program 
+ 
Register usage: 
r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare value 
instrSecHash: 
andi. rl, r3, 0x0040 s if we have done second hash 
bne doISI if so, go to ISI exception 
mfspr r2, hash2 get the second pointer 
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ori r3, r3, 0x0040 change the compare value 
addi ri, 0, 8 load 8 for counter 
addi r2, r2, —-8 pre dec for update on load 
b im0 try second hash 
+ 
entry Not Found: synthesize an ISI exception 
guarded memory protection violation: synthesize an ISI exception 
Entry: 
r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare value 
doISIp: 
mfspr ES; Ssrird get srrl 
andi. r2,xr3, OXFFFE clean upper srrl 
addis r2, r2, 0x0800 or in srr<4> = 1 to flag prot violation 
b isil: 
doISI: 
mfspr r3; srxrl get srrl 
andi. r2, v3, OxFFFF clean srrl 
addis r2, r2, 0x4000 or in srrl<l> = 1 to flag pte not found 
isil mtctr r0 restore counter 
mtspr srr ly <2 set srrl 
mtcrf 0x80, «3 restore CRO 
mfspr r0, sprg4 restore old value of r0 
mfspr rl, sprg5 restore old value of rl 
mfspr r2, sprg6é restore old value of r2 
mfspr r3, sprg7 restore old value of r3 
b vec400 go to instr. access exception 
+ 
Data TLB miss flow 
Entry: 
Vec = 1100 
srrod -> address of instruction that caused data tlb miss 
srrl > 0:3=cr0 4=lru way bit 5=1 if store 16:31 = saved MSR 
dMiss -> ea that missed 
dcCmp -> the compare value for the va that missed 
hashl —-> pointer to first hash pteg 
hash2 -> pointer to second hash pteg 
Register usage: 
r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare value 
.csect tlbmiss[PR] 
.Org vec0+0x1100 
tlbDataMiss: 
mtspr sprg4, r0 save r0O into sprg4 
mtspr sprg5, rl save rl into sprg5 
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save r2 into sprg6 

save r3 into sprg7 

get first pointer 

load 8 for counter 

save counter 

get first compare value 

pre dec the pointer 

load counter 

get next pte 

see if found pte 

dec count br if cmp ne and if count not zero 
if not found set up second hash or exit 
load tlb entry lower-word 

restore counter 

get the miss address for the tlbld 
get the saved crO bits 

restore CRO 

set the pte 

set reference bit 

get byte 7 of pte 

load the dtlb 

update page table 

restore old value of r0 

restore old value of rl 








restore old value of r2 
restore old value of r3 
return to executing program 














ue 


s if we have done second hash 
if so, go to DSI exception 

get the second pointer 

change the compare value 

load 8 for counter 

pre dec for update on load 

try second hash 





-—> address of store that caused the exception 
> 0:3=cr0O 4=lru way bit 5=1 16:31 = saved MSR 








-> the compare value for the va that missed 
—-> pointer to first hash pteg 


mtspr sprg6, r2 
mtspr sprg7, x3 
mfspr r2, hashl 
addi ds Ol. 68 
mfctr r0 
mfspr r3, dCmp 
addi C2, 062; —8 
dm0 : mtctr sank 
dm1: lwzu rl, 8(r2) 
cmp CO Yvit ls, 63 
bdnzf 0, dmil 
bne dataSecHash 
ib rl, +4(r2) 
mtctr r0 
mfspr r0, dMiss 
mfspr £3; Sxrrl 
mtcrf 0x80, x3 
mtspr rpa, rl 
ori rl, rl, 0Ox100 
srw rl, aly 3 
tlibld r0 
stb rl, +6(r2) 
mfspr r0, sprg4 
mfspr rl, sprg5 
mfspr r2, sprgé 
mfspr r3, sprg7 
rfi 
+ 
Register usage: 
r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare val 
dataSecHash: 
andi. rl, r3, 0x0040 
bne doDSI 
mfspr r2, hash2 
ori r3, r3, 0x0040 
addi rl; ..0;.8 
addi B2i 027) 238 
b dm0 
a 
C=0 in dtlb and dtlb miss on store flow 
Entry: 
Vec = 1200 
srrod 
srrl 
dMiss -> ea that missed 
dcmp 
hashl 
hash2 








-> pointer to second hash pteg 
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Register usage: 
r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare value 
.csect tlbmiss[PR] 
-Org vec0+0x1200 
tlbCeq0: 
mtspr sprg4, r0 save r0O into sprg4 
mtspr sprg5, rl save rl into sprg5 
mtspr sprg6, r2 save r2 into sprg6 
mtspr sprg7, x3 save r3 into sprg7 
mfspr r2, hashl get first pointer 
addi rl, 0, 8 load 8 for counter 
mfctr r0 save counter 
mfspr r3, dCmp get first compare value 
addi tA; 02; 8 pre dec the pointer 
ceq0: mtctr rl load counter 
ceql: lwzu rl, 8(r2) get next pte 
cmp CO; Kip £3 see if found pte 
bdneq ceql dec count br if cmp ne and if count not zero 
bne cEqO0SecHash if not found set up second hash or exit 
dL, rl, +4(r2) load tlb entry lower-word 
andi. r3,r1,0x80 check the C-bit 
beq cEqOChkProt if (C==0) go check protection modes 
ceq2: mtctr r0 restore counter 
mfspr r0, dMiss get the miss address for the tlbld 
mfspr ES; Srxrl get the saved cr0 bits 
mtcrf 0x80, “r3 restore CRO 
mtspr rpa, rl set the pte 
tlbld r0 load the dtlb 
mfspr r0, sprg4 restore old value of r0 
mfspr rl, sprg5 restore old value of rl 
mfspr r2, sprgé restore old value of r2 
mfspr r3, sprg7 restore old value of r3 
PRL return to executing program 
+ 
Register usage: 
r0 is saved counter 
rlois junk 
r2 is pointer to pteg 
r3 is current compare value 
cEqOQSecHash: 
andi. rl, r3, 0x0040 s if we have done second hash 
bne doDSI if so, go to DSI exception 
mfspr r2, hash2 get the second pointer 
ori r3, r3, 0x0040 change the compare value 
addi rl, Oy 28 load 8 for counter 
addi CA, “C25 —8 pre dec for update on load 
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ceqoO 


and PTE (c-bit== 




















Jes 
(check protection before setting PTE(c—bit) 


# try second hash 




































































Register usage: 
r0 is saved counter 
rl is PTE entry 
r2 is pointer to pteg 
r3 is trashed 
cEqOChkProt: 
riwinm, 6370 1:3.30;,0;.1 test PP 
bge- chk0 if (PP==00 or PP==01) goto chk0: 
andi. pane wanna test PP[0] 
beqt chk2 return if PP[0]== 
b doDSIp else DSIp 
chk0: mfspr r3;sTrl get old msr 
andis. 4r3,r3,0x0008 test the KEY bit (SRRO-bit 12) 
beq chk2 if (KEY==0) goto chk2: 
b doDSIp else DSIp 
chk2: Ori rl, rl; 0xL80 set reference and change bit 
sth rl, 6(r2) update page table 
b ceq2 and back we go 
+ 
entry Not Found: synthesize a DSI exception 
Entry: 
r0 is saved counter 
rl is junk 
r2 is pointer to pteg 
r3 is current compare value 
doDSI: 
mfspr v3, srrl get srrl 
clwinm rl1,r3,9,6,6# get srrl<flag> to bit 6 for load/store, zero rest 
addis rl, rl, 0x4000 or in dsisr<1l> = 1 to flag pte not found 
b dsil: 
doDSIp: 
mfspr r3, sx¥rl get srrl 
rlwinm rl, r3,9,6,6 get srril<flag> to bit 6 for load/store, zero 
rest 
addis rl, rl, 0x0800 or in dsisr<4> = 1 to flag prot violation 
dsil: mtctr r0 restore counter 
andi. r2, x3, OXFFFF clear upper bits of srrl 
mtspr srriy. £2 set srrl 
mtspr dsisr, rl load the dsisr 
mfspr rl, dMiss get miss address 
rlwinm. r2,r2,0,31,31 test LE bit 
beq dsi2: if little endian then: 
xor r1,r1,0x07 de-mung the data address 
dsi2: mtspr dar, rl put in dar 
mtcrf Ox8O 63 restore CRO 
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mfspr r0, sprg4 # restore old value of r0 
mfspr rl, sprg5 # restore old value of rl 
mfspr r2, sprg6é # restore old value of r2 
mfspr r3, sprg7 # restore old value of r3 
b vec300 # branch to DSI exception 


C.8 MPC755 Instruction Timing (Chapter 6) 


The instruction timing of the MPC755 is identical to that of the MPC750 except for 
addition of the new tlbli and tlbld instructions. 


Table C-21 provides latencies for the new tlbli and tlbld instructions. 


Table C-21. TLB Load and Store Instruction Latencies 














Primary Opcode | Extended Opcode | Mnemonic Execution Unit Clock Cycles 
31 978 tlbId LSU 2& 
31 1010 tIbli LSU 3& 




















Note: Cycle times marked with “&” require a variable number of cycles due to serialization. 


C.9 MPC755 Signal Descriptions (Chapter 7) 


This section describes two new signals that select the I/O voltages for the system bus 
(BVSEL) and the L2 interface (L2VSEL) as described in Table C-22. Refer to the MP-C755 
Hardware Specification for more detailed information about these signals. All other 
MPC755 signals operate the same as the MPC750 signals. 


Table C-22. Voltage-Select Signal Descriptions 











Signal Comments 
BVSEL BVSEL and L2VSEL are assigned to two unused BGA positions on 
the MPC755 360-pin and MPC745 255-pin BGA footprint. Internal 
L2VSEL pull-ups are provided to default to MPC750-compatible I/O voltages 
if unconnected. 














C.10 MPC755 System Interface Operation (Chapter 8) 


This section describes the MPC755 embedded processor bus interface and how its 
operation differs from the MPC750. It shows how the signals, defined in Chapter 7, “Signal 
Descriptions,” interact to perform address and data transfers and describes how the 32-bit 
bus mode is implemented on the MPC755. 


C.10.1 MPC755 System Interface Overview 


The system interface prioritizes requests for bus operations from the instruction and data 
caches, and performs bus operations in accordance with the 60x bus protocol. It includes 
address register queues, prioritization logic, and a bus control unit. The system interface 
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latches snoop addresses for snooping in the data cache and in the address register queues, 
and for reservations controlled by the Load Word and Reserve Indexed (Iwarx) and Store 
Word Conditional Indexed (stwex.) instructions, and maintains the touch load address for 
the cache. The interface allows one level of pipelining; that is, with certain restrictions 
described later, there can be two outstanding transactions at any given time. Accesses are 
prioritized with load operations preceding store operations. 


Instructions are automatically fetched from the memory system into the instruction unit 
where they are dispatched to the execution units at a peak rate of two instructions per clock. 
Conversely, load and store instructions explicitly specify the movement of operands to and 
from the integer and floating-point register files and the memory system. 


When the MPC755 encounters an instruction or data access, it calculates the logical address 
and uses the low-order address bits to check for a hit in the on-chip, 32-Kbyte instruction 
and data caches. During cache lookup, the instruction and data memory management units 
(MMUs) use the higher-order address bits to calculate the virtual address, from which they 
calculate the physical address. The physical address bits are then compared with the 
corresponding cache tag bits to determine if a cache hit occurred in the L1 instruction or 
data cache. If the access misses in the corresponding cache, the physical address is used to 
access the L2 cache tags (if the L2 cache is enabled). If no match is found in the L2 cache 
tags, the physical address is used to access system memory. 


In addition to the loads, stores, and instruction fetches, the MPC755 performs hardware 
table search operations following TLB misses; L2 cache cast-out operations when 
least-recently used cache lines are written to memory after a cache miss; and cache-line 
snoop push-out operations when a modified cache line experiences a snoop hit from another 
bus master. 


Figure C-1 shows the address path from the execution units and instruction fetcher through 
the translation logic to the caches and system interface logic. 


The MPC755 uses separate address and data buses and a variety of control and status 
signals for performing reads and writes. The address bus is 32 bits and the data bus is 32 or 
64 bits. The interface is synchronous—all MPC755 inputs are sampled, and all outputs are 
driven from the rising edge of the bus clock. The processor runs at a multiple of the system 
bus-clock speed. The MPC755 core operates at 1.9—2.1 volts, and the I/O signals operate at 
1.8 or 3.3 volts. 


C.10.2 Address Bus Pipelining 


The MPC750 and MPC755 function identically in that the address bus pipelines an 
instruction transaction before previous data tenures complete for a data transaction. 
Conversely, the processor performs address bus pipelining for a data transaction following 
an instruction transaction. However, address bus pipelining does not occur for two 
consecutive instruction or two consecutive data transactions. Note that this behavior is not 
documented in Chapter 8, “System Interface Operation.” 
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C.10.3 Bus Clocking 


Like the MPC750, the MPC755 requires a single system clock input (SYSCLK) used by 
the (PLL) circuit to generate a master clock for all of the CPU circuitry (including the bus 
interface circuitry) which is frequency- and phase-locked to the SYSCLK input. The master 
clock may be set to integer or half-clock multiples of the SYSCLK frequency. Refer to the 
MPC755 Hardware Specification for the ratios supported. 


C.10.4 32-Bit Data Bus Mode 


The MPC755 supports an optional 32-bit data bus mode in which the data bus high and 
corresponding parity signals (DH[0:31] and DP[0:3]) are used, and the data bus low and 
corresponding parity signals (DL[0:31]) and (DP[4:7]) are ignored. The following list 
summarizes the functionality of the 32-bit data bus mode on the MPC755: 


e Data tenures of 1, 2, and 8 beats supported (1 to 4 bytes per beat). 

e The address and transfer attribute information is unchanged from 64-bit mode. 

¢ The TBST and TSIZ[0:2] signals must be reinterpreted for burst size. 

¢ Data termination is the same for each data beat using TA, DRTRY, and TEA. 

¢ 32-bit mode configured at power-on (hard reset) through the TLBISYNC signal. 


The 32-bit data bus mode operates the same as the 64-bit data bus mode with the exception 
of the byte lanes involved in the transfer and the number of data beats that are performed. 
Only byte lanes 0 through 3 are used, corresponding to the data bus signals DH[0:31] and 
DP[0:3]. Byte lanes 4 through 7 (corresponding to DL[0:31] and DP[4:7]) are never used 
in this mode. The unused data bus signals are not sampled by the processor during read 
operations, and they are driven low during write operations. 


The number of data beats required for a data tenure in 32-bit data bus mode are one, two, 
or eight depending on the size of the transaction and the cache mode for the address. Data 
transactions of one or two data beats are performed for cache-inhibited load/store or 
write-through store operations. These transactions do not assert the TBST signal even 
though a two-beat burst may be performed (that is, the same TBST and TSIZ[0:2] encoding 
as in 64-bit data bus mode). Single-beat data transactions are performed for operations of 
size 4 bytes or less, and double-beat data transactions are performed for 8-byte operations 
only. (The processor only generates an 8-byte operation for a double-word aligned load or 
store-double operation to or from the floating-point registers.) 








Data transactions of eight data beats are performed for burst operations that load into or cast 
out from the MPC755 internal caches. These transactions transfer 32 bytes similarly to 
64-bit mode, and they assert the TBST signal and indicate a transfer size of two (TSIZ[0:2] 
= 010) similar to 64-bit data bus mode. 





Otherwise, the same bus protocols apply for arbitration, transfer, and termination of the 
address and data tenures in 32-bit data bus mode as apply in 64-bit data bus mode. Late 
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ARTRY cancellation of the data tenure applies on the bus clock cycle after the first data beat 
is acknowledged (after the first TA) for word or smaller transactions, or on the bus clock 
cycle after the second data beat is acknowledged (after the second TA) for double-word or 
burst operations (or coincident with the respective TA if no-DRTRY mode is selected). 





C.10.4.1 Burst Ordering 


For burst operations in 32-bit mode, a data block of 32-bytes (one cache line) is transferred 
in the same order as in 64-bit data bus mode with the exception that eight data beats are 
required to perform the transfer instead of four. For each double word of the block that is 
transferred, the upper word of the double word is transferred first on the data bus (on 
DH[0:31]), and then the lower word of the double word is transferred. Table C-23 shows 
the burst order for each starting address. 


Table C-23. Burst Ordering 





























For Double Word Starting Address: 
Data Transfer 
A[27:28] = 00 | A[27:28] = 01 | A[27:28] = 10 | A[27:28] = 11 

ist Data Beat DWO0 -u DW1 -u DW2 -u DW3 - u 
2nd Data Beat DWO - | DW1 -| DW2 - | DW3 - | 
3rd Data Beat DW1 -u DW2 -u DW3 - u DWO - u 
4th Data Beat DW1 - | DW2 -| DW3 - | DWO - | 
5th Data Beat DW2 -u DW3 - u DWO - u DW1 -u 
6th Data Beat DW2 - | DW3 - | DWO - | DW1 - | 
7th Data Beat DW3 - u DWO0 - u DW1 -u DW2 - u 
8th Data Beat DW3 - | DWO - | DW1 - | DW2 - | 




















Notes: 


A[27:28] speci es the rst doub le word of the 32-byte block being transferred; the remaining 
double words to transfer must wrap around the block. 


A[29:31] are always 0b000 for burst transfers initiated by the MPC755. 


“DWx’ represents the double word that would be addressed by A[27:28] = “x” if a non-burst 
transfer were performed. “u” and “I” represent the upper word and lower word of the double 
word, respectively. 


Each data beat is terminated with one valid assertion of TA (without DRTRY cancellation). 


C.10.4.2 Aligned Transfers 


The aligned data transfer cases for 32-bit data bus mode are shown in Table C-24. All of the 
transfers require a single data beat (if cache-inhibited or write-through) except for 
double-word cases that require two data beats. The double-word case is only generated by 
the processor for load or store-double operations to/from the floating-point registers. All 
cache-inhibited instruction fetches are performed as word operations. 
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Table C-24. Aligned Data Transfers—32-Bit Data Bus Mode 
























































Data Bus Byte Lanes 

gaa Teri) eo DHO... ..DH31 | DLO... DL31 

BO B1 B2 B3 B4 B5 B6 B7 
Byte—1 Beat 001 000 A —_ —_ — x x x x 
Byte—1 Beat 001 001 — A —_— — x x x x 
Byte—1 Beat 001 010 —_— — A —_— x x x x 
Byte—1 Beat 001 011 _— — —_ A x x x x 
Byte—1 Beat 001 100 X 
Byte—1 Beat 001 101 X 
Byte—1 Beat 001 110 Xx 
Byte—1 Beat 001 111 X 
Half-Word—1 Beat 010 000 Xx 
Half-Word—1 Beat 010 010 Xx 
Half-Word—1 Beat 010 100 X 
Half-Word—1 Beat 010 110 X 
Word—1 Beat 100 000 Xx 
Word—1 Beat 100 100 Xx 
Double Word—‘st Beat 000 000 Xx 
Double Word—2nd Beat 000 X 





























Notes: 


“A”: Byte lanes that are read or written during that bus transaction. 
“—”: These lanes are ignored during read transactions and driven with unde ned data dur ing write transactions. 
“x”: Byte lanes are not used in 32-bit data bus mode. They are not sampled by the MPC755 during reads and are 


driven low during writes. 


C.10.4.3 Misaligned Data Transfers 


Misaligned data transfer cases operate similarly in 32-bit data bus mode as in 64-bit data 
bus mode with the usual exception that only the DH[0:31] data bus is used. An example of 
a four-byte misaligned transfer starting at each possible byte address within a double word 
is shown in Table C-25. 


Table C-25. Misaligned Data Transfers Example—32-Bit Data Bus Mode 





Program Size of 
Word (4 Bytes) 


Data Bus Byte Lanes 





Bus Bus 


TSIZ[0:2] A[29:31] DHO... ..DH31 | DLO... ..DL31 





BO B1 B2 B3 B4 B5 B6 B7 





Aligned 


100 000 A A A A x x x x 








Misaligned—1st Access 





0141 00 1 —_— A A A x x x x 
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Table C-25. Misaligned Data Transfers Example—32-Bit Data Bus Mode (continued) 







































































Data Bus Byte Lanes 
Woe Sa ere Aisa 4 | DHO.. ..DH31 | DLO... DL31 
Bo B2 B3 B4 B5 B6 B7 
2nd Access 001 100 A — — Xx X X X 
Misaligned—1st Access 010 010 —_ A A x x x x 
2nd Access 010 100 A — — X Xx Xx X 
Misaligned—1st Access 001 011 — —_ A x x x x 
2nd Access 011 100 A A — Xx Xx Xx X 
Aligned 100 100 A A A X X Xx Xx 
Misaligned—1st Access 011 101 — A A x x x x 
2nd Access 001 000 A — — Xx Xx X X 
Misaligned—1st Access 010 110 —_ A A x x x x 
2nd Access 010 000 A _— — X Xx X Xx 
Misaligned—1st Access 001 111 — —_— —_ A x x x x 
2nd Access 011 000 A A A — Xx Xx Xx X 
Notes: 


“A”: Byte lane read in 
“x”: Ignored byte lane (does not need to be valid) 


C.10.4.4 Selecting D32 Mode 


The processor selects 64- or 32-bit data bus mode at power-up by sampling the state of the 
TLBISYNC signal at the negation of HRESET (coming out of hard reset). If the 
TLBISYNC signal is high (negated) at the negation of HRESET, 64-bit data mode is 
selected. If TLBISYNC is low (asserted), 32-bit data mode is used. 


For 32-bit systems not using the TLBISYNC signal, TLBISYNC can be connected to 
HRESET directly. Otherwise, it can be connected to a pull-up resistor to select 64-bit mode. 
For systems using the TLBISYNC input function, the state of HRESET must be logically 
combined in the TLBIS YNC generation path to select the desired mode. 


C.10.4.5 Signal Relationships 


The signal relationships for 32-bit mode are the same as 64-bit mode. Figure C-14 and 
Figure C-15 show an example of an 8-beat burst transaction and a 2-beat burst transaction 
with DRTRY, respectively. 
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| 1/2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 14 | 12 | 


SYSCLK | | | 





DH(0:31] 
TA 
DRTRY 

TEA nn ee 


Figure C-14. 32-Bit Data Bus Mode—8-Beat Burst (No Retry Conditions) 


| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 14 | 12 | 


SYSCLK | | | | | 


, 





DBB 
bH(0:31 

fs 

DRTRY 
TEA a 


Figure C-15. 32-Bit Data Bus Mode—2-Beat Burst (with DRTRY) 
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C.11 MPC755 L2 Cache Interface Operation (Chapter 9) 


This section describes the L2 cache interface operation of the MPC755, and how it differs 
from the MPC750. 


C.11.1 MPC755 L2 Cache Interface Overview 


The MPC755 L2 cache is implemented with an on-chip, two-way set-associative tag 
memory, and with external synchronous SRAMs for data storage, similar to the MPC750. 
The external SRAMs are accessed through a dedicated L2 cache port which supports a 
single bank of up to 1 Mbyte of synchronous SRAMs. The L2 cache normally operates in 
copyback mode and supports system cache coherency through snooping. The differences 
from the MPC750 L2 cache interface are summarized as follows: 

¢ Support for 4-1-1-1 PB3 synchronous burst-only SRAMs 

e Additional control of the L2 interface during low-power operation 

e Additional information about (and control of) the L2 DLL circuitry 

e A new instruction-only mode 

e Private memory capability for half or all of the L2 SRAM 


¢ More flexible control of the L2 parity signals by allowing data or data and address 
parity 


In addition to including the MPC755-specific information, this section supersedes 
Chapter 9, “L2 Cache Interface Operation.” 


Figure C-16 shows a typical connection from the MPC755 processor L2 interface to a bank 
of PB3 SRAMs. See Chapter 9, “L2 Cache Interface Operation,” for typical connections to 
other SRAM technologies. Note that the signals for the L2 interface on the MPC755 are the 
same as those used for the MPC750. 
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L2ADDR[16:0] Addr[16:0] 
L2DATA[0:63] Data[0:31] 
L2DP[0:7] (optional) Parity[0:3] 
L2CE @ elE 
L2WE & > 
L2Z7Z ° =| ADS 
1 ADSP PpB3 
SRAM 
L2CLK_OUTA eK 128K x36 
Addr[0:16] 
MPC755 
Data[0:31] 
Parity[0:3] 
L2SYNC_OUT L2CE_ IE 
L2WE lw 
L2SYNC_IN L2ZZ ADS 
1 ADSP PB3 
(optional) SRAM 
L2CLK_OUTB >| K 


128K x 36 





Notes: 

. For a 1-Mbyte L2, use address bits 0-16 (bit 0 is LSB). 

For a 512-Kbyte L2, use address bits 0-15 (bit 0 is LSB). 

For a 256-Kbyte L2, use address bits 0-14 (bit 0 is LSB). 

External clock routing should ensure that the rising edge of the L2 clock is coincident 
at the K input of all SRAMs and at the L2SYNC_IN input of the MPC755. The clock ‘A’ 
network only could be used, or the clock ‘B’ network could also be used depending on 
loading, frequency, and number of SRAMs. 

No pull-up resistors are normally required for the L2 interface. 

The MPC755 supports only one bank of SRAMs. 

For high-speed operation, no more than two loads should be presented on each L2 
interface signal. 


Figure C-16. Typical Synchronous 1-Mbyte L2 Cache System Using PB3 SRAM 


= 


PON 


mie con 


C.11.1.1 L2 Cache Organization 


The MPC750 L2 cache interface is implemented with an on-chip, two-way set-associative 
tag memory with 4096 tags per way, and a dedicated interface with support for up to 
1 Mbyte of external synchronous SRAM for data storage. The tags are sectored to support 
either two cache blocks per tag entry (two sectors, 64 bytes), or four cache blocks per tag 
entry (four sectors, 128 bytes) depending on the L2 cache size. If the L2 cache is configured 
for 256 Kbytes or 512 Kbytes of external SRAM, the tags are configured for two sectors 
per L2 cache block. The L2 tags are configured for four sectors per L2 cache block when 
1 Mbyte of external SRAM is used. Each sector (32-byte L1 cache block) in the L2 cache 
has its own valid and modified bits and other status bits that implement the MEI cache 
coherency protocol. 
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Table C-26 lists the data RAM organizations for the various L2 cache sizes. Table C-26 also 
indicates typical SRAM sizes that might be used to construct such a cache. 


Table C-26. L2 Cache Sizes and Data RAM Organizations 

















L2 Cache Size a pana Schein Example SRAM Sizes 
256 Kbytes 64/72 bit 32 Kbytes x 64/72 (2) 32 Kbytes x 32/36 

512 Kbytes 64/72 bit 64 Kbytes x 64/72 (2) 64 Kbytes x 32/36 

1 Mbyte 64/72 bit 128 Kbytes x 64/72 (2) 128 Kbytes x 32/36 

















Notes: 
The MPC755 supports only one bank of SRAMs. 
For very high speed operation, no more than two SRAM devices should be used. 


C.11.1.2 L2 Cache Control 


The L2 cache control register (L2CR) allows control of L2 cache configuration and timing, 
byte-level data parity generation and checking, global invalidation of L2 contents, 
write-through operation, and L2 test support. The L2 cache interface provides two clock 
outputs that allow the clock inputs of the SRAMs to be driven at select frequency divisions 
of the processor core frequency. See the MPC755 Hardware Specifications for details about 
the specific frequency ratios supported. For more details about the L2CR, see 
Section C.11.4.1, “L2 Cache Control Register (L2CR).” 


C.11.1.3 L2 Private Memory 


A portion, or all, of the L2 cache can alternately be used as a private SRAM. In this way, a 
portion of the physical address space can be mapped into a portion of the L2 SRAM. This 
functionality is described in Section C.11.2.2, “L2 Private Memory Operation.’ When 
private SRAM is used and the upper bits of the physical address match the L2PM[PMBA] 
field, the data is written or read from the private space of the L2 SRAM instead of external 
memory. Note that all of the SRAM can be designated as private, or for 512 Kbytes or 
1 Mbyte SRAM, half can be designated as private and half as L2 cache. See Table C-29 for 
all the supported combinations. Also, see Section C.11.6.5, “Cache Control Instructions 
and Effect on Private Memory Operation,” for information on the operation of cache control 
instructions with respect to private memory space. 


C.11.2 L2 Interface Operation 


This section describes the general operation of both the L2 cache and the private memory 
capabilities of the L2 interface. 
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C.11.2.1 L2 Cache Operation 


The MPC755 L2 cache is a combined instruction and data cache that receives memory 
requests from both L1 instruction and data caches independently. The L1 requests are 
generally the result of instruction fetch misses, data load or store misses, write-through 
operations, or cache management instructions. Each L1 request generates an address 
lookup in the L2 tags. If a hit occurs, the instructions or data are forwarded to the L1 cache. 
A miss in the L2 tags causes the L1 request to be forwarded to the 60x bus interface. The 
cache block received from the bus is forwarded to the L1 cache immediately, and is also 
loaded into the L2 cache with the tag marked valid and unmodified. If the cache block 
loaded into the L2 causes a new tag entry to be allocated and the current tag entry is marked 
valid modified, the modified sectors of the tag to be replaced are cast out from the L2 cache 
to the 60x bus. 


See Section C.11.6.4, “Other Cache Control Instructions and Effect on L2 Cache,’ for more 
information on the operation of cache control operations on the L2 cache. 


C.11.2.1.1 L2 Cache Access Priorities 


At any given time the L1 instruction cache may have one instruction fetch request, and the 
L1 data cache may have one load and two stores requesting L2 cache access. The L2 cache 
also services snoop requests from the 60x bus. When there are multiple pending requests to 
the L2 cache, snoop requests have highest priority, followed by data load and store requests 
(serviced on a first-in, first-out basis). Instruction fetch requests have the lowest priority in 
accessing the L2 cache when there are multiple accesses pending. 


If read requests from both the L1 instruction and data caches are pending, the L2 cache can 
perform a hit-under-miss operations and supplies the available instruction or data while a 
bus transaction for the previous L2 cache miss is performed. The L2 cache does not support 
miss-under-miss, and the second instruction fetch or data load stalls until the bus operation 
resulting from the first L2 miss completes. 


C.11.2.1.2 L2 Cache Services 


All requests to the L2 cache that are marked cacheable (even if the respective L1 cache is 
disabled or locked) cause a tag lookup and will be serviced if the instructions or data are in 
the L2 cache. Burst requests from the L1 caches and single-beat read requests that hit in the 
L2 cache are forwarded the instructions or data, and the L2 LRU bit for that tag is updated. 
Burst writes from the L1 data cache due to a castout or replacement copyback are written 
only to the L2 cache, and the L2 cache sector is marked modified. Designers should note 
that during burst transfers into and out of the L2 cache SRAM array, an address is generated 
by the MPC755 for each data beat. 


If the L2 cache is configured as write-through, the L2 sector is marked unmodified, and the 
write is forwarded to the 60x bus. If the L1 castout requires a new L2 tag entry to be 
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allocated and the current tag is marked modified, any modified sectors of the tag to be 
replaced are cast out of the L2 cache to the 60x bus. 


Single-beat reads that miss in the L2 cache do not cause any state changes in the L2 cache 
and are forwarded on the 60x bus interface. Cacheable single-beat store requests marked 
copyback that hit in the L2 are allowed to update the L2 cache sector, but do not cause L2 
cache sector allocation or deallocation. Cacheable, single-beat store requests that miss in 
the L2 are forwarded to the 60x bus. Single-beat store requests marked write-through 
(through address translation or through the configuration of L2CR[L2WT)]) are written to 
the L2 cache if they hit and are written to the 60x bus independent of the L2 hit/miss status. 
If the store hits in the L2 cache, the modified/unmodified status of the tag remains 
unchanged. 


C.11.2.1.3 L2 Cache Coherency and WIMG Bits 


Different from the MPC750, a request to the L2 cache on the MPC755 that is marked 
cache-inhibited by address translation (through either the MMU or by default WIMG 
configuration) will hit in the L2 cache if it has been previously loaded (and is still valid), 
causing a paradox condition. However, misses for cache-inhibited accesses do not cause a 
new entry to be allocated and do not cause any L2 cache tag state change. 


C.11.2.1.4 Single-Beat Accesses to L2 Interface 


The processor performs single-beat read and write accesses when the L1 instruction and/or 
data caches are disabled, and when the WIMG bit settings indicate that an area of memory 
is cache-inhibited (this case not forwarded to the L2 interface). Additionally, single-beat 
writes occur to the L2 interface when that area of memory is designated as write-through. 
PB2 SRAMs naturally support single-beat read and write accesses. However, the L2 
interface requires 64-bit accesses to the SRAM. Therefore, for single-beat writes, the 
MPC750 and MPC755 automatically perform a read-modify-write operation in order to 
write the complete 64-bits to the L2. 


PB3 SRAMs support bursting accesses only. Thus, for PB3 SRAMs, the L2 interface 
always automatically performs a burst read for a complete cache line from the SRAM. If a 
single-beat read was requested, then the appropriate double word is forwarded to the L1. 
Write accesses to PB3 SRAMs also require burst accesses. Thus for a single-beat write, the 
L2 interface automatically performs a burst read-modify-write in order to perform the 
complete write burst. 


C.11.2.2 L2 Private Memory Operation 


The L2 interface of the MPC755 can also be used as a low-latency, high-bandwidth private 
memory space. The private memory space is not snooped and is therefore not coherent with 
other processors in a system. The private space can contain instructions and data and its 
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contents can be cached in the L1 instruction and data caches provided the accesses are 
marked cacheable. 


The private memory receives requests from both the L1 instruction cache and the L1 data 
cache independently. The L1 requests are generally the result of instruction misses, data 
load or store misses, Ll data cache castouts, write-through operations, or cache 
management instructions. For all cacheable accesses, the L1 requests are looked-up in the 
L2 tags and compared with the corresponding PMBA bits of the L2PM. If a match occurs 
with L2PM[PMBA], the result of the L2 tag lookup is ignored and the request is forwarded 
to the external L2 SRAM interface as a private memory access. All transactions that read 
or write data, except those caused by the eciwx and ecowx instructions, are allowed to hit 
in the private memory space, regardless of the WIMG memory/cache attribute bits. 


Transactions caused by the icbi, sync, tlbie, tlbsync, eieio, eciwx, and ecowx instructions 
never hit in the private memory space and are forwarded to the system interface. Accesses 
caused by the debi instruction that hit in the private memory space are discarded (after 
invalidating the L1 data cache). The private memory space does not have coherency state 
information. When the L1 data cache is reloaded for a cacheable load or store, the state will 
be exclusive or modified, respectively. 


Generally, the private memory operates according to the following: 
e Arbitration is shared with the L2 cache and thus uses the same priorities. 


¢ Burst read requests from the L1 instruction or data caches that map to the private 
memory space are forwarded data from the L2 SRAMs designated as private 
memory. Cache-inhibited stores write the appropriate data to the L2 interface. 


¢ Requests to the L2 interface that are marked cacheable by address translation (even 
if the respective L1 cache is locked) are serviced by the L2 interface if they map to 
the private memory space. 


¢ Burst read and single-beat read requests from the L1 instruction or data caches that 
map to the private memory space are forwarded data from the L2 SRAMs designated 
as private memory. 


¢ Burst read requests from the L1 instruction or data caches that do not map to private 
memory space (and miss in the L2 cache, if enabled) initiate a burst read operation 
from the system interface for the cache line that missed. The cache line received 
from the bus is forwarded to the appropriate L1 cache (and the L2 cache, if enabled). 


¢ Normal burst writes from the L1 data cache due to castouts (also referred to as 
replacement copybacks) that map to the private memory space are written to the 
external SRAMs designated as private memory regardless of the L2CR[L2IO] 
setting. Burst writes that don’t map to the private memory space are allocated in the 
L2 cache (if enabled). 


Note that software-generated single-beat reads and writes directed to the private memory 
SRAMs are handled in the same way as described for the SRAMs as L2 cache, and 
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read-modify-write transactions are performed automatically by the L2 controller as needed 
as described in Section C.11.2.1.4, “Single-Beat Accesses to L2 Interface.” 


See Section C.11.6.4, “Other Cache Control Instructions and Effect on L2 Cache,’ for more 
information on the operation of cache control operations on the L2 cache. However, the 
following apply to the private memory space: 


¢ Cacheable stwex. operations are handled by the L1 data cache similarly to normal 
cacheable stores. The L2 interface does not treat stwex. differently than a normal 
cacheable store. Cache-inhibited stwex. accesses that hit in the private memory 
space write the appropriate data to the L2 interface and are not forwarded to the 
system interface. 


¢ dcbz operations that hit in the private memory space do not affect the data in the 
external SRAMs. They are handled entirely by the L1. 


¢ dcbf operations are issued to the L2 interface after being processed by the L1 data 
cache. If a debf that hits in L1 data cache and requires a line push hits in the private 
memory space, the cache line is written to the L2 interface. dcbf operations that hit 
in the private memory space are never forwarded to the system interface. 


e dcbst instructions are issued to the L2 cache after being processed by the L1 data 
cache. If a debst that hits in the L1 data cache and requires a line push hits in the 
private memory space, the cache line is written to the external SRAMs. debst 
operations that hit in the private memory space are never forwarded to the system 
interface. 


¢ dcbi instructions that hit in the private memory space are discarded and are never 
forwarded to the system interface. 


¢ icbi instructions never affect the L2 interface and are just passed to the system 
interface for further processing. 


* sync, eieio, eciwx, ecowx, tlbie, and tlbsync instructions pass though the L2 
interface and are forwarded to the system interface for further processing. 


Note that L2 cache-related performance monitor events may not produce expected results 
when L2 private memory is enabled. Specifically, hits to the private memory are treated as 
L2 cache misses by the performance monitor. No new performance monitor events have 
been added to specifically support the L2 private memory. 


C.11.3 L2 Clocking 


The MPC755 generates the clock for the external L2 synchronous data RAMs in the same 
way as the MPC750. The clock frequency for the RAMs is divided down from the core 
clock frequency of the MPC755. The divided-down clock is then phase-adjusted by an 
on-chip delay-lock loop (DLL) circuit, sent out from the MPC755 to the external RAMs, 
and then returned as an input to the DLL so that the rising-edge of the clock as seen at the 
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external RAMs can be aligned to the clocking of the internal latches in the MPC755 L2 bus 
interface. 


The core-to-L2 frequency divisor for the L2 PLL is selected through the L2CLK bits of the 
L2CR register. Generally, the divisor must be chosen according to the frequency supported 
by the external RAMs, the internal core operating frequency, and the phase adjustment 
range that the L2 DLL supports. The L2 RAM frequency can be divided down from the core 
operating frequency as described in the MPC755 Hardware Specification. Additional 
supported frequency ratios for the MPC755 are also highlighted in the hardware 
specification. 


C.11.4 L2 Registers 


This section describes the cache configuration bits in the L2 cache control register (L2CR) 
and the L2 cache private memory control register (L2PM). 


C.11.4.1 L2 Cache Control Register (L2CR) 


The L2 cache control register of the MPC755 is a read/write, supervisor-level, 
implementation-specific SPR used to configure and operate the L2 cache, and it is slightly 
different from the L2CR of the MPC750. The differences are summarized as follows: 


¢ New encoding for LZRAM field defined for PB3 SRAM support 

¢ More output hold options defined for L2OH field 

¢ New L2CR bit for instruction-only mode—L2IO 

¢ New L2CR fields defined for low-power operation and DLL control—L2CS, 
L2DRO, and L2CTR 


The L2CR register can be accessed with the mtspr and mfspr instructions using SPR 1017 
(decimal). Note that all bits of L2CR are cleared by a hard reset and on power-on reset. 
Figure C-17 shows the bits of the L2CR. 


L2SL 




















































































































LoTs L2DF Reserved 
L2wT —————_ —— L2BYP 
L2cTL ————_ —— 1210 
L2E L2I L2cs 
= L2PE  L2D0 7 ——L2DRO L2IP — 
| L2SIZ L2CLK |L2RAM L2OH OmnO) | | L2CTR | 
012 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 30 31 


Figure C-17. L2 Cache Control Register (L2CR) 


The L2CR bits for the MPC755 are described in Table C-27. 
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Table C-27. L2 Cache Control Register 














Bits | Name Description 

0 L2E  |L2 enable. Enables L2 cache operation (including snooping) starting with the next transaction the 
L2 cache unit receives. Before enabling the L2 cache, the L2 clock must be con gured through 
L2CR[2CLK], and the L2 DLL must stabilize (see the MPC755 Hardware Specifications) and all 
other L2CR bits must be set appropriately. The L2 cache may need to be globally invalidated. 

1 L2PE |L2 data parity generation and checking enable. Enables parity generation and checking for the L2 
data RAM interface. When disabled, generated parity is always zeros. Note that the L2 interface 
always generates and drives parity on the L2DP[0:7] signals for writes to the SRAM array. 

2-3 L2SIZ |L2 size. Should be set according to the size of the L2 data RAMs used. A 256-Kbyte L2 cache 
requires a data RAM con gur ation of 32 Kbytes x 64 bits; a 512-Kbyte L2 cache requires a 
con gur ation of 64 Kbyte x 64 bits; a 1-Mbyte L2 cache requires a con gur ation of 128 Kbytes x 64 
bits. 
00 Reserved 
01 256 Kbyte 
10 512 Kbyte 
11. 1 Mbyte 

4-6 | L2CLK |L2 clock ratio (core-to-L2 frequency divider). Speci es the cloc k divider ratio between the core clock 
frequency and the L2 data RAM interface. When these bits are cleared, the L2 clock is stopped and 
the on-chip DLL for the L2 interface is disabled. For non-zero values, the processor generates the 
L2 clock and the on-chip DLL is enabled. After the L2 clock ratio is chosen, the DLL must stabilize 
before the L2 interface can be enabled (see the MPC 755 Hardware Specifications). The resulting 
L2 clock frequency cannot be slower than the clock frequency of the 60x bus interface. 
000 L2 clock and DLL disabled 
001 +1 
010 +1.5 
011 Reserved 
100 +2 
101 +2.5 
110 +3 
111. Reserved 

7-8 | L2ZRAM |L2 RAM type—Con gures the L2 interface for the type of synchronous SRAMs used: 











* Flow-through (register-buffer) synchronous burst SRAMs that clock addresses in and ow data 
out 

¢ Pipelined (register-register) PB2 synchronous burst SRAMs that clock addresses in and clock 
data out (with 3-1-1-1 access times) 

« Pipelined (register-register) PB3 synchronous burst SRAMs (with 4-1-1-1 access times) 

¢ Late-write synchronous SRAMs, for which the MPC755 requires a pipelined (register-register) 
con gur ation. Late-write RAMs require write data to be valid on the cycle after WE is asserted 
rather than on the same cycle as the write enable (as required with traditional burst RAMs). 

For the PB2 burst RAM selection, the MPC755 does not burst data into the L2 cache; it generates 

an address for each access. However, for the PB3 burst RAM selection, the MPC755 does burst 

data into the L2 cache. If the SRAMs or part of the SRAM is con gured as an L2 cache , the L1 

caches should be enabled for data to be ef ciently loaded into the L2 cache f or all types of SRAMs; 

otherwise, signi cant latencies are incurred. If all the L2 SRAM cache is con gured as pr ivate 

memory, disabled L1 instruction and data caches do not affect the L2 latencies. 

Pipelined SRAMs may be used for all L2 clock modes. Note that o w-through SRAMs can be used 

only for L2 clock modes that are divide-by-2 or slower (divide-by-1 and divide-by-1.5 not allowed). 

00 Flow-through (register-buffer) synchronous burst SRAM 

01 Pipelined (register-register) PB3 synchronous burst SRAM 

10 Pipelined (register-register) PB2 synchronous burst SRAM 

11 Pipelined (register-register) synchronous late-write SRAM 
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Table C-27. L2 Cache Control Register (continued) 


Description 





10 


L2DO 


L2| 


L2 data-only. Setting this bit enables data-only operation in the L2 cache. For this operation, 
instruction transactions from the L1 instruction cache already cached in the L2 cache can hit in the 
L2, but new instruction transactions from the L1 instruction cache are treated as cache-inhibited 
(bypass L2 cache, no L2 checking done). When both L2DO and L2I0 are set, the L2 cache is 
effectively locked (cache misses do not cause new entries to be allocated but write hits use the L2). 


L2 global invalidate. Setting L2I invalidates the L2 cache globally by clearing the L2 bits including 
status bits. This bit must not be set while the L2 cache is enabled. 





11 


L2CTL 


L2 RAM control (ZZ enable). Setting L2CTL enables the automatic operation of the L2ZZ (low-power 
mode) signal for cache RAMs that support the ZZ function (PB2 RAMs). If L2CTL is set, L2ZZ 
asserts automatically when the MPC755 enters nap or sleep mode and negates automatically when 
the MPC755 exits nap or sleep mode. 

The use of this bit is not recommended for future compatibility. This bit should not be set when the 
MPC755 is in nap mode and snooping is to be performed through the negation of QACK. 
Additionally, it should not be set when using PB3 SRAMs. 





12 


L2WT 


L2 write-through. Setting L2WT selects write-through mode (rather than the default write-back 
mode) so all writes to the L2 cache also write through to the 60x bus. For these writes, the L2 cache 
entry is always marked as exclusive rather than modi ed. This bit must never be set after the L2 
cache has been enabled because previously-modi ed lines could get re-mar ked as exclusive during 
normal operation. 





13 


L2TS 


L2 test support. Setting L2TS causes cache block pushes from the L1 data cache that result from 
dcbf and debst instructions to be written only into the L2 cache and marked valid, rather than being 
written only to the 60x bus and marked invalid in the L2 cache in case of a hit. This bit allows a 
dcbz/debf instruction sequence to be used with the L1 cache enabled to easily initialize the L2 
cache with any address and data information. This bit also keeps dcbz instructions from being 
broadcast on the 60x bus and single-beat cacheable store misses in the L2 from being written to the 
60x bus. 





14-15 


16 


L20H 


L2SL 


L2 output hold. These bits con gure output hold time for address, data, and control signals driven 
by the MPC755 to the L2 data RAMs. They should generally be set according to the SRAM’s input 
hold time requirements, for which late-write SRAMs usually differ from o w-through or burst SRAMs. 
See the MPC 755 Hardware Specification for the actual recommended values. 

00 Least hold time 

01 More hold time 

10 Even more hold time 

11. Most output hold time 


L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to 
increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, 
L2SL should be set if the L2 RAM interface is operated below 110 MHz. 








17 





L2DF 





L2 differential clock. Setting L2DF con gures the two clock-out signals (L2CLK_OUTA and 
L2CLK_OUTB) of the L2 interface to operate as one differential clock. In this mode, the B clock is 
driven as the logical complement of the A clock. This mode supports the differential clock 
requirements of late-write SRAMs. Generally, this bit should be set when late-write SRAMs are 
used. 
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Bits 


Name 


Table C-27. L2 Cache Control Register (continued) 


Description 





18 


L2BYP 


L2 DLL bypass. The DLL unit receives three input clocks: 

« A square-wave clock from the PLL unit to phase adjust and export 

« A non-square-wave clock for the internal phase reference 

* A feedback clock (L2SYNC_IN) for the external phase reference. 

Setting L2BYP causes the non-square wave clock (#2) to be used for both phase adjust and phase 
reference (#1 and #2), thus bypassing the square wave clock from the PLL. (Note that clock #2 is 
the actual clock used by the registers of the L2 interface circuitry.) L2BYP is intended for use when 
the PLL is being bypassed. If the PLL is being bypassed, the DLL must be operated in 1:1 mode 
and SYSCLK must be fast enough for the DLL to support. 





19-20 


Reserved. These bits are implemented but not used; keep at 0 for future compatibility. 





21 


L210 


L2 instruction-only. Setting this bit enables instruction-only operation in the L2 cache. For this 
operation, data transactions from the L1 data cache already cached in the L2 cache can hit in the 
L2 (including writes), but new data transactions (transactions that miss in the L2) from the L1 data 
cache are treated as cache-inhibited (bypass L2 cache, no L2 checking done). When both L2DO 
and L2IO are set, the L2 cache is effectively locked (cache misses do not cause new entries to be 
allocated but write hits use the L2). Note that this bit can be programmed dynamically. 





22 


L2CS 


L2 clock stop. Setting this bit causes the L2 clocks to the SRAMs to automatically stop whenever 
the MPC755 enters nap or sleep modes, and automatically restart when exiting those modes 
(including for snooping during nap mode). It operates by asynchronously gating off the 
L2CLK_OUT[A:B] signals while in nap or sleep mode. The L2 SYNC_OUT/SYNC_IN path remains 
in operation, keeping the DLL synchronized. This bit is provided as a power-saving alternative to the 
L2CTL bit and its corresponding ZZ pin, which may not be useful for dynamic stopping/restarting of 
the L2 interface from nap and sleep modes due to the relatively long recovery time from ZZ negation 
that many SRAM vendors require. 





23 


L2DRO 


L2 DLL rollover. Setting this bit enables a potential rollover (or actual rollover) condition of the DLL 
to cause a checkstop for the processor. A potential rollover condition occurs when the DLL is 
selecting the last tap of the delay line, and thus may risk rolling over to the rst tap with one 
adjustment while in the process of keeping synchronized. Such a condition is improper operation for 
the DLL, and, while this condition is not expected, it allows detection for added security. This bit 
should be set when the DLL is rst enab led (set with the L2CLK bits) to detect rollover during initial 
synchronization. It could also be set when the L2 cache is enabled (with L2E bit) after the DLL has 
achieved its initial lock. 





24-30 


L2CTR 


L2 DLL counter (read-only). These bits indicate the current value of the DLL counter (0 to 127). They 
are asynchronously read when the L2CR is read, and as such, should be read at least twice with 
the same value in case the value is asynchronously caught in transition. These bits are intended to 
provide observability of where in the 128-bit delay chain the DLL is at any given time. Generally, the 
DLL operation should be considered at risk if it is found to be within a couple of taps of its beginning 
or end point (tap O or tap 128). 





31 








L2IP 





L2 global invalidate in progress (read only). This read-only bit indicates whether an L2 global 
invalidate is occurring. It should be monitored after an L2 global invalidate has been initiated by the 
L2I bit to determine when it has completed. 


C.11.4.2 L2 Private Memory Control Register (L2PM) 


The L2 private memory control register is a new register in the MPC755 that allows a 
portion of the physical address space to be mapped into a portion of the L2 SRAM. It is a 
read/write, supervisor-level, implementation-specific register (SPR) which is accessed with 
the mtspr and mfspr instructions using SPR 1016 (decimal). Note that all bits of L2PM are 
cleared by a hard reset or power-on reset. Figure C-18 shows the bits of the L2PM. 
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Figure C-18. L2 Private Memory Control Register (L2PM) 


29 30 31 


The L2PM bits are described in Table C-28. 
Table C-28. L2PM Bit Settings 





Bit Name Description 





0-13 | PMBA | Private memory base address. If the upper bits of the physical address match the PMBA, the data 
is written or read from the private memory space of the L2 SRAM instead of external memory. 
0-11 for 1 Mbyte 
0-12 for 512 Kbytes 


0-13 for 256 Kbytes 





14-29 Reserved 





30-31 | PMSIZ | Private memory size. These bits along with the L2SIZ bits of the L2CR determine the amount of the 
L2 cache that is used as private memory space. See Table C-29 for the L2 SRAM con gur ations. 
00 = Private memory disabled 
01 = 256 Kbytes 
10 = 512 Kbytes 


11 =1 Mbyte 

















Table C-29 describes the combinations possible (and the required bit settings) for using 
some or all of the L2 SRAM as private memory. 


Table C-29. L2 SRAM Configuration 


























Total L2 Configured Only as Configuredias Configured Only as 
1/2 L2 Cache and ' 
SRAM L2 Cache ; Private Memory 
1/2 Private Memory 
256KB |L2E=1 L2E =0 
L2SIZ = 01 (256 Kbytes) | Not Available L2SIZ = don’t care 
PMSIZ = 00 (disabled) PMSIZ = 01 (256 Kbytes) 
512KB |L2E=1 L2E =1 L2E =0 
L2SIZ = 10 (512 Kbytes) |L2SIZ = 01 (256 Kbytes) |L2SIZ = don’t care 
PMSIZ = 00 (disabled) PMSIZ = 01 (256 Kbytes) | PMSIZ = 10 (512 Kbytes) 
1M L2E =1 L2E =1 L2E =0 
L2SIZ = 11 (1 Mbyte) L2SIZ = 10 (512 Kbytes) |L2SIZ = don’t care 
PMSIZ = 00 (disabled) PMSIZ = 10 (512 Kbytes) | PMSIZ = 11 (1 Mbyte) 








C.11.5 L2 Address and Data Parity Signals 


The L2 parity signals (L2DP[0:7]) can be generated and checked by setting the L2PE bit in 
the L2CR. The parity bits are generated and checked using the corresponding L2DATA 
signals, and represent odd parity. If the L2ZAP_EN bit in HID2 is also set, the LZAADDR 
signals are also included in the parity generation and checking (again, representing odd 
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parity) on the MPC755. Table C-30 lists the association between L2DP[0:7] signals and the 


L2DATA and L2ADDR signals. 


Table C-30. L2 Data Parity Signal Associations 


























Siqnat | L2AP_EN=0 | L2AP_EN=1 
9 L2PE =1 L2PE = 1 
L2DPO | L2DATAO:7] | L2DATAIO:7], 
L2ADDR(O:2] 
L2DP1 | L2DATA[8:15] | L2DATA[8:15], 
L2ADDRI3:4] 
L2DP2 | L2DATA[16:23] | L2DATA[16:23], 
L2ADDRI5:6] 
L2DP3 | L2DATA[24:31] | L2DATA[24:31], 
L2ADDRI7:8] 
L2DP4 | L2DATA32:39] | L2DATA[32:39], 
L2ADDR(9:10] 
L2DP5 | L2DATA[40:47] | L2DATA[40:47], 
L2ADDR[11:12] 
L2DP6 | L2DATA[48:55] | L2DATA[48:55], 
L2ADDR[13:14] 
L2DP7 | L2DATA[56:63] | L2DATA[56:63], 
L2ADDR[15:16] 

















C.11.6 L2 Cache Programming Considerations 


This section describes some of the programming considerations for controlling the L2 
cache and the effect of other cache control instructions on the L2 cache. 


C.11.6.1 Enabling and Disabling the L2 Cache 


Following a power-on or hard reset, the L2 cache and the L2 DLL are disabled initially. 
Before enabling the L2 cache, the L2 DLL must first be configured through the L2CR 
register, and the DLL must be allowed sufficient time (see the MPC755 Hardware 
Specifications) to achieve phase lock. Before enabling the L2 cache, other configuration 
parameters must be set in the L2CR, and the L2 tags must be globally invalidated. The L2 
cache should be initialized during system start-up. 


The sequence for initializing the L2 cache is as follows: 
¢ Power-on reset (automatically performed by the assertion of HRESET signal). 
e Disable L2 cache by clearing L2CR[L2E]. 


¢ Set the L2CR[L2CLK] bits to the desired clock divider setting. Setting a non-zero 
value automatically enables the DLL. All other L2 cache configuration bits should 
be set to properly configure the L2 cache interface for the SRAM type, size, and 
interface timing required. 
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¢ Wait for the L2 DLL to achieve phase lock. This can be timed by setting the 
decrementer for a time period equal to 640 L2 clocks, or by performing an L2 global 
invalidate. 


¢ Perform an L2 global invalidate. The global invalidate could be performed before 
enabling the DLL, or in parallel with waiting for the DLL to stabilize. Refer to 
Section C.11.6.2, “L2 Cache Global Invalidation,” for more information about L2 
cache global invalidation. Note that a global invalidate always takes much longer 
than it takes for the DLL to stabilize. 


e After the DLL stabilizes, an L2 global invalidate has been performed, and the other 
L2 configuration bits have been set, enable the L2 cache for normal operation by 
setting the L2CR[L2E] bit to 1. 


Note that if the L1 data cache is disabled and the L2 cache is enabled, hits in the L2 work 
correctly and update the L2. However, no new entries are allocated into the L2 because 
when the L1 data cache is disabled, the processor only performs single-beat accesses. Thus, 
these accesses all propagate to the 60x bus interface (the L2 only stores and allocates entries 
for burst accesses). 


Before the L2 cache is disabled it must be flushed to prevent coherency problems. Note that 
the cache management instructions debf, dcebst, and debi do not affect the L1 data cache 
or L2 cache when they are disabled. 


C.11.6.2 L2 Cache Global Invalidation 


The L2 cache supports a global invalidation function in which all bits of the L2 tags (tag 
data bits, tag status bits, and LRU bit) are cleared. It is performed by an on-chip hardware 
state machine that sequentially cycles through the L2 tags. The global invalidation function 
is controlled through L2CR[L2I]], and it must be performed only while the L2 cache is 
disabled. The MPC755 can continue operation during a global invalidation provided the L2 
cache has been properly disabled before the global invalidation operation starts. Note that 
the MPC755 must be operating at full power (low power modes disabled) in order to 
perform L2 cache invalidation. 


The sequence for performing a global invalidation of the L2 cache is as follows: 


¢ Clear HIDO[DPM] bit to zero. Dynamic power management must be disabled. 


e Execute a sync instruction to finish any pending store operations in the load/store 
unit, disable the L2 cache by clearing L2CR[L2E], and execute an additional syne 
instruction after disabling the L2 cache to ensure that any pending operations in the 
L2 cache unit have completed. 

e Initiate the global invalidation operation by setting the L2CR[L2I] bit to 1. 

¢ Monitor the L2CR[L2IP] bit to determine when the global invalidation operation is 
completed (indicated by the clearing of L2CR[L2IP]). The global invalidation 
requires approximately 32K core clock cycles to complete. 
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e After detecting the clearing of L2CR[L2IP], clear L2CR[L2I] and re-enable the L2 
cache for normal operation by setting L2CR[L2E]. Also, dynamic power 
management can be enabled at this time. 


C.11.6.3 L2 Cache Flushing 


L1 cache-block-push operations generated by the execution of dcebf and debst instructions 
write through to the 60x bus interface and invalidate the L2 cache sector if they hit. The 
execution of debf and debst instructions that do not cause a cache-block-push from the L1 
cache are forwarded to the L2 cache to perform a sector invalidation and/or push from the 
L2 cache to the 60x bus as required. If the debf and debst instructions do not cause a sector 
push from the L2 cache, they are forwarded to the 60x bus interface for address-only 
broadcast if HIDO[ABE] is set to 1. 


C.11.6.4 Other Cache Control Instructions and Effect on L2 Cache 


The execution of the stwex. instruction results in single-beat writes from the L1 data cache. 
These single-beat writes are processed by the L2 cache according to hit/miss status, L1 and 
L2 write-through configuration, and reservation-active status. If the address associated with 
the stwex. instruction misses in the L2 cache or if the reservation is no longer active, the 
stwex. instruction bypasses the L2 cache and is forwarded to the 60x bus interface. If the 
stwex. hits in the L2 cache and the reservation is still active, one of the following actions 
occurs: 


e Ifthe stwex. hits a modified sector in the L2 cache (independent of write-through 
status), or if the stwex. hits both the L1 and L2 caches in copy-back mode, the stwex. 
is written to the L2 and the reservation completes. 


e Ifthe stwex. hits an unmodified sector in the L2 cache, and either the L1 or L2 is in 
write-through mode, the stwex. is forwarded to the 60x bus interface and the sector 
hit in the L2 cache is invalidated. 


The debi instruction is always forwarded to the L2 cache and causes a segment invalidation 
if a hit occurs. The debi instruction is also forwarded to the 60x bus interface for broadcast 
if HIDO[ABE] is set to 1. The icbi instruction invalidates only L1 cache blocks and is never 
forwarded to the L2 cache. Any debz instructions marked global do not affect the L2 cache 
state. If a dcbz instruction hits in the L1 and L2 caches, the L1 data cache block is cleared 
and the debz instruction completes. If a dcbz instruction misses in the L2 cache, it is 
forwarded to the 60x bus interface for broadcast. Any dcbz instructions that are marked 
nonglobal act only on the L1 data cache. Note that the debz instruction on the MPC755 
must be preceded by a debf instruction to that address. 


The sync and eieio instructions bypass the L2 cache and are forwarded to the 60x bus. 
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C.11.6.5 Cache Control Instructions and Effect on Private Memory 


Operation 


When private memory is used as all or part of the L2 interface, cache control instructions 
function as follows: 


Cacheable stwex. operations are handled by the L1 data cache similarly to normal 
cacheable stores. The L2 interface does not treat stwex. differently than a normal 
cacheable store. Cache-inhibited stwex. accesses that hit in the private memory 
space write the appropriate data to the L2 interface and are not forwarded to the 
system interface. 


dcbz operations that hit in the private memory space do not affect the data in the 
external SRAMs. They are handled entirely by the L1. 


dcbf operations are issued to the L2 interface after being processed by the L1 data 
cache. If a debf that hits in L1 data cache and requires a line push hits in the private 
memory space, the cache line is written to the L2 interface. dcbf operations that hit 
in the private memory space are never forwarded to the system interface. 


dcbst instructions are issued to the L2 cache after being processed by the L1 data 
cache. If a debst that hits in the L1 data cache and requires a line push hits in the 
private memory space, the cache line is written to the external SRAMs. debst 
operations that hit in the private memory space are never forwarded to the system 
interface. 


debi instructions that hit in the private memory space are discarded and are never 
forwarded to the system interface. 


icbi instructions never affect the L2 interface and are just passed to the system 
interface for further processing. 


sync, eieio, eciwx, ecowx, tlbie, and tlbsync instructions pass though the L2 
interface and are forwarded to the system interface for further processing. 


C.11.6.6 L2 Cache Testing 


Several features are provided to facilitate testing of the L2 cache. The original MPC750 
User’s Manual supplied some incorrect recommended procedures for testing the L2 cache. 
This section contains a corrected L2 cache test description that applies for both the 
MPC750 and the MPC755. 


A typical test for verifying the proper operation of the MPC755 L2 cache memory (external 
SRAM and tag) performs the following steps: 


1. Initialize the test sequence by disabling address translation to invoke the default 


WIMG setting of Ob0011. 


2. Set the L2CR[L2DO] and L2CR[L2TS] bits and perform a global invalidation of 


the L1 data cache and the L2 cache. The L1 instruction cache can remain enabled to 
improve execution efficiency. 
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After initialization of the test sequence is complete, the L2 cache external SRAM may be 
tested using the following procedure: 


I; 


Enable the L2 cache and the L1 data cache. Caches should have been invalidated 
during the initialization step. 


. Execute a series of dcbz, stw, and debf instructions to initialize the cache with a 


sequential range of addresses and with cache data consisting of zeros. 


. Disable the L1 data cache. 
. Initialize the performance monitor counters to zero, and enable counting of L2 hits 


in the appropriate MMCR register. Refer to Chapter 11, “Performance Monitor,” 
for complete details on using the performance monitors. 


. Perform a series of single-beat load and store operations using a variety of non-zero 


bit patterns to test for stuck bits and pattern sensitivities in the L2 cache SRAM. 
These loads and stores should be in the range of addresses used to initialize the 
caches in step 2 so that each access will hit in the L2 cache. 


. Disable the performance monitor counters, and read the value for the L2 cache hits. 


Verify that this result matches the accesses performed by the test routine. 


A complete L2 cache test should test the tag memory as well as the SRAMSs. Each bit of tag 
memory should be tested by loading the cache tags with data consisting of all zeros in one 
way of the cache and all ones in the other way. Then, a series of accesses should be 
performed, walking a one or zero through the upper address bits to test for stuck bits and 
pattern sensitivities in the tag. 


The number of tag bits used by the cache depends on the size of the cache. On the MPC750 
and the MPC755, a 256-Kbyte cache uses 15 tag bits, a 512-Kbyte cache uses 14 tag bits, 
and a 1-Mbyte cache uses 13 tag bits. 


For example, to test all the tag bits of a 512-Kbyte cache, a test program needs to do the 
following: 


Initialize the test sequence by disabling address translation to invoke the default 
WIMG bit settings of 0b0011. 


Set the L2CR[L2DO] and L2CR[L2TS] bits and perform an invalidation of the L1 
data cache and the L2 cache. The L1 instruction cache may remain enabled for 
efficiency. 

Enable the L2 cache and the L1 data cache. 


Perform a series of dcbz, stw, and debf operations to fill the cache with unique data. 
Fill way 0 of the tag with data consisting of all zeros, and fill way | of the tag with 
data consisting of all ones. The following pseudocode illustrates this procedure: 








cache_size = (512 * 1024) // 512 Kbyte 

cache_line_size = 32 // 32 byte cache line size for 750 
tag_bits = 14 // for 512 Kbyte cache 

r10 = 0x00000000 // all zeros in upper tag_bits bits 
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rll = OxFFFc0000 // all ones in upper tag_bits bits 
r12 = 0 // index 
for (i = 0; i < (1/2 cache_size / cache_line_size); i++) 
{ 
dcbz r10,r12 // zero out line in Ll dcache 
add r13,r10,r12 // create unique data 
stwx £13,711 0;612 // store unique data in newly 
// allocated Ll cache entry 
dcbf rl0; 712 // push data to L2 cache WAY 0 
dcbz rll1,r12 // zero out line in Ll dcache 
add £13 rel? // create unique data 
stwx C13 -r1k,ri2 // store unique data in newly 
// allocated Ll cache entry 
dcbf cli, 12 // push data to L2 cache WAY 1 
rl2 += cache_line_siz // go to next cache line and repeat 








} 
Disable the L1 data cache. 


Read back the data just written and verify its correctness. Use the performance 
monitors to count load hits in the L2 to verify that the data came from the L2. The 
number of hits should equal the number of loads. 


Attempt a series of loads from the cache with addresses that should not be in the tag 
by walking a one through the upper tag bits: 


r15 = 0x80000000 // address with a 1 in the top bit 


for (i = 0; i < tag_bits; itt) 





initialize/enable the performance monitor counters to count load hits 








r12 = 0x00000000 // index 

for (j = 0; j < (1/2 cache_size / cache_line_size); j++) 
lwzx 613, F1 5,712 // attempt to load data 

rl2 += cache_line_siz // go to the next cache line 





} 


disable the performance monitors, check to ensure that there were no 


r15 = r15 >> 1 // shift the one bit 
// to the right for the next iteration 

} 
Then perform a similar series of loads, this time by walking a zero through a series 
of addresses with ones in the upper tag bits. The first iteration of the inner loop above 
uses the start address 0x7FFC_0000, the second iteration uses the start address 
OxBFFC_0000, the third OxDFF_C000, and so on for each tag bit for the case of a 
512-Kbyte cache. If there are any load hits at any point in the loop, there is a faulty 
tag in the cache. 


Repeat the entire process, this time with all ones in the way 0 tag entries, and all 
zeros in the way | tag entries. (r10 = OxFFFC_0000 and r11 = 0x0000_0000 in the 
pseudocode for the fourth step above for a 512-Kbyte cache.) 
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Caution: For these L2 cache tests, instruction translation is disabled and the L1 instruction 
cache is enabled. This means that WIMG defaults to 0b0011. Even though the L2 cache is 
in data-only mode, if an address in the L2 matches an instruction access, the L2 will hit and 
provide data for that access. 


Therefore, cache test programs should avoid loading the L2 with address ranges that match 
the memory location of the test code. Otherwise, instruction accesses will hit on test data 
and cause random program behavior. For the test procedure described here, the test program 
should be located outside the address ranges Ox0000_0000 + cache_size and 
OxFFFF_FFFF — cache_size. 


The entire L2 cache may be tested by clearing L2CR[L2DO] and L2CR[L2TS], restoring 
the L1 and L2 caches to their normal operational state, and executing a comprehensive test 
program designed to exercise all the caches. The test program should include operations 
that cause L2 hit, reload, and castout activity that can be subsequently verified through the 
performance monitor. 


Most of the tests described in this section only use the performance monitors to verify the 
number of cache hits that occurred during the test. While the performance monitors also 
provide facilities for counting L2 cache misses, this facility is only useful for counting L2 
cache misses that cause burst reads to memory to occur. With the L1 data cache disabled 
and the L2CR[L2TS] bit set, all accesses are single-beat and therefore are not counted by 
the MPC750's performance monitor as L2 cache misses. The performance monitors can 
only be used to count misses when the L1 cache is enabled. 


C.11.7 L2 Cache SRAM Timing Examples 


Chapter 9, “L2 Cache Interface Operation,” describes the signal timing for the three types 
of SRAM (flow-through burst SRAM, pipelined burst SRAM, and late-write SRAM) 
supported by the MPC750 L2 cache interface. This section provides example timing 
diagrams for the new PB3 synchronous burst SRAMs supported by the MPC755. The 
timing diagrams illustrate the best case logical (ideal, not AC-timing accurate) interface 
operations. For proper interface operation, the designer must select SRAMs that support the 
signal sequencing illustrated in the timing diagrams. Note that the PB3 SRAMs operate 
differently from the PB2 SRAMs, and require a different configuration setting in L2CR. 


PB3 SRAMs provide the efficiencies of the late-write SRAMs, but operate more like 
traditional PB2 SRAMs (that is, there is no internal write queue). They may be available at 
speeds comparable to late-write SRAMs, but closer to PB2 prices. They achieve their 
speed/price benefits by staging the initial internal array access over two clock cycles, 
thereby requiring an additional wait state for the first read data beat. 


C.11.7.1 Pipelined PB3 Burst SRAM 


Pipelined burst SRAMs operate at higher frequencies than flow-through burst SRAMs by 
clocking the read data from the memory array into a buffer before driving the data onto the 


MPC750 RISC Microprocessor Family User’s Manual 


For More Information On This Product, 
Go to: www.freescale.com 


Freescale Semiconductor, Inc. 
MPC755 L2 Cache Interface Operation (Chapter 9) 


data bus. This causes initial read accesses by the pipelined burst SRAMs to occur one cycle 
later than flow-through burst SRAMs, but the L2 bus frequencies supported can be higher. 
Note that the MPC750 L2 cache interface requires the use of single-cycle deselect pipelined 
burst SRAMs for proper operation. Some PB3 SRAM devices have strobes with data 
latches that allow for very late clocking. The MPC755 doesn’t support this feature. The 
MPC755 supports strobeless use of the PB3 devices and all timing (including setup times) 
must meet the specifications described in the MPC755 Hardware Specifications. 


Figure C-19 shows a burst read-read-read memory access sequence when the L2 cache 
interface is configured with PB3 burst SRAMs. 


}1]2]|3]4]5|6|7 | 8] 9 | 10] 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 





SRAMCIk 
l I I I i] l I I ] I l I I I I l l I l l l I 
L2ceE'! \ | I I 1fir\ i I l 1 fir\i I I 1 fi I I I I I | 
I I I I I I I I I I 
L2WE | | } } I } } } I } } } } } | } | } } } } } 
~<—urst rd—e 9 | ~<—Dhurst rd —> <M urst rd — 
SRAMAddress 12 
I I i] I I I ] I I I I I I I I I 
SRAMMemory 










SRAMData 
Notes: 
For PB3, L2ZZ is reused as L2ADS and asserts during the 1st clock only of each L2CE assertion. 


For PB3, the internal array access requires 1 cycle to row select, 1 cycle for each column select of burst (a—d), 
and 1 cycle to deselect if write. 


Figure C-19. Burst Read-Read-Read L2 Cache Access (Pipelined) 


Figure C-20 shows a burst write-write-write memory access sequence when the L2 cache 
interface is configured with PB3 burst SRAMs. 


}1]2]3]4 ]5]6|7 | 8] 9 | 10] 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 


SRAMCIk | | | | 






I 
3 

I I I I I I I I I I I I I I I I I I I 
SRAMData ! = i = 
Notes: 


For PB3, L2ZZ is reused as L2ADS and asserts during the 1st clock only of each L2CE assertion. 


For PB3, the internal array access requires 1 cycle to row select, 1 cycle for each column select of burst (a—d), 
and 1 cycle to deselect if write. 


Figure C-20. Burst Write-Write-Write L2 Cache Access (Pipelined) 
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Figure C-21 shows a burst read-write-read memory access sequence when the L2 cache 
interface is configured with PB3 burst SRAMs. 


}1]/2]/3]4 ]5 |6]|7 | 8 | 9 | 10] 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 


r 
N 
mi 





SRAMAddress 


SRAMMemory + 





































I 
I 
SRAMData ! 


Notes: 
For PB3, L2ZZ is reused as L2ADS and asserts during the 1st clock only of each L2CE assertion. 


For PB3, the internal array access requires 1 cycle to row select, 1 cycle for each column select of burst (a—d), 
and 1 cycle to deselect if write. 


Figure C-21. Burst Read-Write-Read L2 Cache Access (Pipelined) 


C.11.8 Private Memory SRAM Timing 


The timing for private memory SRAM is the same as the L2 cache timing described in 
Section C.11.7, “L2 Cache SRAM Timing Examples.” 


C.12 Power and Thermal Management (Chapter 10) 


The power and thermal management of the MPC755 functions the same as that of the 
MPC750, and is completely described in Chapter 10, “Power and Thermal Management,” 
except for the restriction on global L2 cache invalidation described in Section C.11.6.2, “L2 
Cache Global Invalidation.’ Additionally, for both the MPC750 and MPC755, no 
combination of the thermal assist unit, the decrementer register, and the performance 
monitor can be used at any one time. If exceptions for any two of these functional blocks 
are enabled together, multiple exceptions caused by any of these three blocks cause 
unpredictable results. 


C.13 Performance Monitor (Chapter 11) 


The performance monitor of the MPC755 functions the same as that of the MPC750, and 
is completely described in Chapter 11, “Performance Monitor,’ except that for both the 
MPC750 and MPC755, no combination of the thermal assist unit, the decrementer register, 
and the performance monitor can be used at any one time. If exceptions for any two of these 
functional blocks are enabled together, multiple exceptions caused by any of these three 
blocks cause unpredictable results. 
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Appendix D 
User’s Manual Revision History 


for the MPC750 RISC Microprocessor Family 


This appendix provides a list of the major differences between Revision 0 and Revision 1 
of the MPC750 RISC Microprocessor User’s Manual. These corrections also apply to the 
MPC740, the MPC755, and the MPC745, which are described in MPC750 RISC 
Microprocessor Family User’s Manual. For convenience, the section number and page 
number of the errata item in the original user’s manual are provided. Note that the list only 
includes the major changes to the user’s manual. 


Section, Page 


Change 


Throughout the UM _ Added references to Appendix C, “MPC755 Embedded G3 


1.1, 1-3 


1.2.1, 1-4 


2.1.1, 2-7 


DA2.25 229 


Microprocessor,’ and added the appendix. 


In Figure 1-1, the 60x BIU is connected to L1 cache and the data path 
between the 60x BIU and L2 BIU is 64-bit. Also integer unit 1 should 
have only an add sign, and integer unit 2 should have the add, 
multiply, and divide signs. 


Remove the multiply and divide instructions inside the parentheses 
of the [U2 description, and the sentence should read as follows: 
“TU2 can execute all integer instructions except multiply and divide 
instructions (shift, rotate, arithmetic, and logical instructions).” 
The implementation note for the decrementer register (DEC) should 
read as follows: 


“In the MPC750, the decrementer register is decremented and the 
time base is incremented at a speed that is one-fourth the speed of the 
bus clock.” 


In Figure 2-3, the DBP bit in HIDO register should not be reserved. 
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2.1.2.2, 2-9 In Table 2-4, replace the description of HIDO[DBP] (bit 1), with the 


following: 





1 DBP 











Disable 60x bus address and data parity generation. 


0 
1 


The system generates address and data parity. 

Parity generation is disabled and parity signals are driven to 0 during bus 
operations. When parity generation is disabled, all parity checking should 
also be disabled and parity signals need not be connected. 





Replace the description of HIDO[BTIC] (bit 26), with the following: 





26 | BTIC 











BTIC enable. Used to enable use of the 64-entry branch instruction cache. 


0 


1 


The BTIC contents are invalidated and the BTIC behaves as if it were 
empty. New entries cannot be added until the BTIC is enabled. 
The BTIC is enabled and new entries can be added. 





2,1,2,2,2-12 In Table 2-4, the description of HIDO[IFEM] (bit 23) should read for 
setting to zero as follows: 





23 | IFEM 











Enable M bit on bus for instruction fetches. 


0 


1 


M bit not re ected on b us for instruction fetches. Instruction fetches are 
treated as nonglobal on the bus 
Instruction fetches re ect the M bit from the WIM settings. 


















































2.1.2.4.5, 2-18 Replace Table 2-11 with the following: 
Encoding Description 
00 0000 | Register holds current value. 
00 0001 _ | Counts processor cycles. 
000010 | Counts completed instructions. Does not include folded branches. 
00 0011 | Counts transitions from 0 to 1 of TBL bits speci ed through 
MMRCO[RTCSELECT]. 00 = 47, 01 = 51, 10 = 55, 11 = 63. 
00 0100 | Counts instructions dispatched. 0, 1, or 2 instructions per cycle. 
00 0101 | Counts L1 instruction cache misses. 
000110 | Counts ITLB misses. 
00 0111 | Counts L2 instruction misses. 
00 1000 _ | Counts branches predicted or resolved not taken. 
00 1001 | Counts MSR[PR] bit toggles. 
00 1010 | Counts times reserved load operations completed. 
00 1011 | Counts completed load and store instructions. 
001100 | Counts snoops to the L1 and the L2. 
00 1101 | Counts L1 cast-outs to the L2. 
001110 | Counts completed system unit instructions. 
00 1111 | Counts instruction fetch misses in the L1. 
01 0000 _ | Counts branches allowing out-of-order execution that resolved correctly. 
All others | Reserved. 
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2.3.4.3.10, 2-52 
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In Table 2-18, replace the description of L2CR[L2SL] (bit 16) with 
the following: 








16 


L2SL | L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay 
line. It is intended to increase the delay through the DLL to accommodate 
slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM 
interface is operated below 150 MHz. 








Add the following footnote for the stfd instruction in Table 2-39: 


The MPC750 and MPC755 require that the FPRs be initialized with 
floating-point values before the stfd instruction is used. Otherwise, a 
random power-on value for an FPR may cause unpredictable device 
behavior when the stfd instruction is executed. Note that any 
floating-point value loaded into the FPRs is acceptable. 


Add the following note as a footnote to the mtsr and mtsrin 
instructions in Table 2-59: 


The MPC750 and MPC755 have a restriction on the use of the mtsr 
and mtsrin instructions not described in the Programming 
Environments Manual.The MPC750 and MPC755 require that an 
isync instruction be executed after either an mtsr or mtsrin 
instruction. This isyne instruction must occur after the execution of 
the mtsr or mtsrin and before the data address translation 
mechanism uses any of the on-chip segment registers. 


Add the following to the end of the section: 


Both the MPC750 and MPC755 processors require protection in the 
use of the dcbz instruction in order to guarantee cache coherency in 
a multiprocessor system. Specifically, the debz instruction must be: 


¢ Either enveloped by high-level software synchronization protocols 
(such as semaphores), or 


¢ Preceded by execution of a debf instruction to the same address. 


One of these precautions must be taken in order to guarantee that 
there are no simultaneous cache hits from a debz instruction and a 
snoop to that address. If these two events occur simultaneously, stale 
data may occur, causing system failures. 


The machine check exception in the Table 4-3 should read as 
follows: 
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Table 4-3. MPC750 Exception Priorities 


Cause 











ih Machine check |Any enabled machine check condition (L2 data parity error, assertion of TEA or MCP) 


Asynchronous Exceptions (Interrupts) 








4.5.2, 4-14 


4.5.11, 4-20 


5.1.7, 5-18 


5.1.8, 5-19 


5.4.3.1, 5-26 


5.4.3.1, 5-27 


SA. 314527 


5.4.4, 5-29 


Remove the reference to parity error in L1 cache from machine 
check exception conditions. Thus, the first paragraph of this section 
should read as follows: 


The MPC750 implements the machine check exception as defined in 
the PowerPC architecture (OEA). It conditionally initiates a machine 
check exception after an address or data parity error occurs on the 
bus or in the L2 cache, after receiving a qualified transfer error 
acknowledge (TEA) indication on the MPC750 bus, or after the 
machine check interrupt (MCP) signal had been asserted. As defined 
in the OFA, the exception is not taken if MSR[ME] is cleared, in 
which case the processor enters the checkstop state. 


Remove Table 4-10, “Trace Exception—SRR1 Settings.” This 
interrupt is implemented as defined by the OEA. Remove Table 4-10 
and its introductory text. 





Table 5-4, delete the second row in the table (lwarx or stwex. with 
W= 1). 

In Table 5-5, remove the next-to-last paragraph (“In addition, 
depending...) from the tlbie description. 

The next-to-last paragraph should read as follows: 


The TLB entries are on-chip copies of PTEs in the page tables in 
memory and are similar in structure. To uniquely identify a TLB 
entry as the required PTE, the TLB entry also contains four more bits 
of the page index, EA[10—13] (in addition to the API bits in the PTE). 
The second sentence in the second paragraph should read as follows: 


ITLB miss exception conditions are reported when there are no more 
instructions to be dispatched or retired (the pipeline is empty). 


The second sentence in the fourth paragraph should read as follows: 


Thus, TLB entries must be explicitly cleared by the system software 
(with the tlbie instruction) before address translation is enabled. 
Figure 5-8 in the original manual incorrectly shows the loopback 
arrow on the left side pointing to the node above the word 
‘Otherwise’. Replace Figure 5-8 with the following: 
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Effective Address 
Generated 


(See Figure 5-6) 







Otherwise 
Instruction Fetch with N-Bit 








6.7, 6-32 


Alignment Exception 








Page Table 
Search Operation 


(See Figure 5-9) 


Set in Segment Descriptor 
Page Address (No-Execute) 
Translation 


Generate 52-Bit Virtual Address 
from Segment Descriptor 


Compare Virtual Address 








with TLB Entries 
TLB Hit Case 
O 
debz Instruction 
with W orl = 1 Otherwise 
Check Page Memory 
Protection Violation Conditions 
(See The Programming 
Environments Manual) 
Access Permitted Access Prohibited 





(See The 
Programming 
Environments 

Manual) 


Page Memory 
Store Access with d Protection Violation 
Otherwise 


PTE [C]=0 






PA[0-31]<-RPNIIA[20-31] 
Continue Access to Memory Sub- 
system with WIMG-Bits from PTE 








Figure 5-8. Page Address Translation Flow—TLB Hit 


The last paragraph should read as follows: 


“Table 6-6 shows integer instruction latencies. Note that IU1 
executes all integer arithmetic instructions—multiply, divide, shift, 
rotate, arithmetic, and compare. [U2 executes all integer instructions 
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6.7, 6-33 


7.2.5.2.1, 7-14 


7.2.5.2.1, 7-14 


7.2.5.2.1, 7-14 


T2962 5 1223 


8.3.1, 8-12 
8.3.2, 8-13 


8.3.2.2.2, 8-14 


8.3.2.4, 8-17 


9.1.2, 9-5 
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except multiply and divide (that is, shift, rotate, logical, and 
compare).” 


In Table 6-6, remove [U2 from the unit column for instructions 
mulhwu|[.] and mulhw].]. 


For ARTRY, change “Timing Comments,” “Assertion,” to the 
following: 


Asserted the second bus cycle following the assertion of TS if a retry 
is required. 

For ARTRY, change the first sentence of the first paragraph of 
“Timing Comments,” “Negation,” to the following: 


Negation/HighZ—Driven until the bus_clk cycle following the 
assertion of AACK. 


For ARTRY, change the last sentence of the first paragraph, “Timing 
Comments,” “Negation,” to the following: 





First the buffer goes to high impedance for a minimum of one-half 
processor cycle (dependent on the clock mode); then it is driven 
negated for one-half bus cycle before returning to high impedance. 


For SRESET, change “State Meaning,” “Asserted,” to the following: 


Does not initialize internal resources (different from HRESET 
assertion). However, initiates processing for a reset exception as 
described in Section 4.5.1, “System Reset Exception (0x00100),” 
(same as HRESET). 


An overbar is missing for TS in the last sentence in the paragraph. 


In Figure 8-6, the first signal should read as qualBG instead of 
qualBG. 


Add the following paragraph to the end of this section: 


For operations generated by the eciwx/ecowx instructions, a transfer 
size of 4 bytes is implied, and the TBST and TSIZ[0:2] signals are 
redefined to specify the resource ID (RID). The RID is copied from 
bits 28—31 of the external access register (EAR). For these 
operations, the TBST signal carries the EAR[28] data without 
inversion (active high). 


In Table 8-4, the fifth row in the TSIZ[O—2] column should read as 
010 instead of 011. 


In Table 9-1, replace the description of L2CR[L2DO] (bit 9), with 
the following: 
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9 L2DO_ | L2 data-only. Setting L2DO inhibits the caching of instructions in the L2 cache. 
All accesses from the L1 instruction cache are treated as cache-inhibited by 
the L2 cache (bypass L2 cache, no L2 tag look-up performed). 

















In Table 9-1, replace the description of L2CR[L2SL] (bit 16) with 
the following: 





16 | L2SL L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay 
line. It is intended to increase the delay through the DLL to accommodate 
slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM 
interface is operated at a frequency below the value speci ed in the MPC750 
Hardware Specifications. 








9.1.4, 9-7 Add the following to the end of the first paragraph of this section, 
with the new step shown below inserted at the beginning of the 
bulleted list: 


“Note that the MPC750 must be operating at full power (low power 
modes disabled) in order to perform L2 cache invalidation. 


The sequence for performing a global invalidation of the L2 cache is 
as follows: 


¢ Clear HIDO[DPM] bit to zero. Dynamic power management must 
be disabled.” 


and then the rest of the bulleted list for the sequence follows. 


9.1.7.1, 9-10-11 In Figure 9-2, Figure 9-3,and Figure 9-4,change L2CE and L2WE 
signals to L2ZCE and L2WE. 


9.1.7.2, 9-11-12 In Figure 9-5, Figure 9-6, and Figure 9-7, change L2CE and LZ2WE 
signals to L2CE and L2WE. 


9.1.7.3, 9-12-14 Figure 9-8, Figure 9-9, and Figure 9-10, change L2CE and L2WE 
signals to L2CE and L2WE. 


11.2.1.5, 11-7 Replace Table 11-6 with the following (this errata also applies to 
MPC75S): 


Table 11-6. PMC2 Events—MMCRO[26-31] Select Encodings 




















Encoding Description 





00 0000 _ |Register holds current value. 





000001 |Counts processor cycles. 





000010 |Counts completed instructions. Does not include folded branches. 





000011 |Counts transitions from 0 to 1 of TBL bits speci ed through 
MMRCO[RTCSELECT]. 00 = 47, 01 = 51, 10 = 55, 11 = 63. 


000100 [Counts instructions dispatched. 0, 1, or 2 instructions per cycle. 





000101 |Counts L1 instruction cache misses. 














000110 |Counts ITLB misses. 
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Table 11-6. PMC2 Events—MMCRO[26-31] Select Encodings (continued) 


Encoding Description 





000111 |Counts L2 instruction misses. 





001000 |Counts branches predicted or resolved not taken. 


00 1001 |Counts MSR[PR] bit toggles. 





001010 |Counts times reserved load operations completed. 





00 1011 |Counts completed load and store instructions. 





001100 |Counts snoops to the L1 and the L2. 





001101 |Counts L1 cast-outs to the L2. 





001110 |Counts completed system unit instructions. 





001111 |Counts instruction fetch misses in the L1. 


010000 |Counts branches allowing out-of-order execution that resolved correctly. 





All others |Reserved. 
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Glossary of Terms and Abbreviations 


The glossary contains an alphabetical list of terms, phrases, and abbreviations used in this 
book. Some of the terms and definitions included in the glossary are reprinted from JEEE 
Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic, copyright ©1985 by 
the Institute of Electrical and Electronics Engineers, Inc. with the permission of the IEEE. 





A Architecture. A detailed specification of requirements for a processor or 
computer system. It does not specify details of how the processor or 
computer system must be implemented; instead it provides a 
template for a family of compatible implementations. 


Asynchronous exception. Exceptions that are caused by events external to 
the processor’s execution. In this document, the term ‘asynchronous 
exception’ is used interchangeably with the word interrupt. 


Atomic access. A bus access that attempts to be part of a read-write operation 
to the same address uninterrupted by any other access to that address 
(the term refers to the fact that the transactions are indivisible). The 
PowerPC architecture implements atomic accesses through the 
Iwarx/stwex. instruction pair. 





B BAT (block address translation) mechanism. A software-controlled array 
that stores the available block address translations on-chip. 


Biased exponent. An exponent whose range of values is shifted by a constant 
(bias). Typically a bias is provided to allow a range of positive values 
to express a range that includes both positive and negative values. 


Big-endian. A byte-ordering method in memory where the address n of a 
word corresponds to the most-significant byte. In an addressed 
memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0 
being the most-significant byte. See Little-endian. 


Block. An area of memory that ranges from 128 Kbyte to 256 Mbyte whose 
size, translation, and protection attributes are controlled by the BAT 
mechanism. 
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Boundedly undefined. A characteristic of certain operation results that are 
not rigidly prescribed by the PowerPC architecture. Boundedly- 
undefined results for a given operation may vary among 
implementations and between execution attempts in the same 
implementation. 


Although the architecture does not prescribe the exact behavior for 
when results are allowed to be boundedly undefined, the results of 
executing instructions in contexts where results are allowed to be 
boundedly undefined are constrained to ones that could have been 
achieved by executing an arbitrary sequence of defined instructions, 
in valid form, starting in the state the machine was in before 
attempting to execute the given instruction. 


Branch folding. The replacement with target instructions of a branch 
instruction and any instructions along the not-taken path when a 
branch is either taken or predicted as taken. 


Branch prediction. The process of guessing whether a branch will be taken. 
Such predictions can be correct or incorrect; the term ‘predicted’ as 
it is used here does not imply that the prediction is correct 
(successful). The PowerPC architecture defines a means for static 
branch prediction as part of the instruction encoding. 


Branch resolution. The determination of whether a branch is taken or not 
taken. A branch is said to be resolved when the processor can 
determine which instruction path to take. If the branch is resolved as 
predicted, the instructions following the predicted branch that may 
have been speculatively executed can complete (see completion). If 
the branch is not resolved as predicted, instructions on the 
mispredicted path, and any results of speculative execution, are 
purged from the pipeline and fetching continues from the 
nonpredicted path. 


Burst. A multiple-beat data transfer whose total size is typically equal to a 
cache block. 





Cache. High-speed memory containing recently accessed data and/or 
instructions (subset of main memory). 


Cache block. A small region of contiguous memory that is copied from 
memory into a cache. The size of a cache block may vary among 
processors; the maximum block size is one page. In processors that 
use the PowerPC architecture, cache coherency is maintained on a 
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cache-block basis. Note that the term ‘cache block’ is often used 
interchangeably with ‘cache line’. 


Cache coherency. An attribute wherein an accurate and common view of 
memory is provided to all devices that share the same memory 
system. Caches are coherent if a processor performing a read from 
its cache 1s supplied with data corresponding to the most recent value 
written to memory or to another processor’s cache. 


Cache flush. An operation that removes from a cache any data from a 
specified address range. This operation ensures that any modified 
data within the specified address range is written back to main 
memory. This operation is generated typically by a Data Cache 
Block Flush (debf) instruction. 


Caching-inhibited. A memory update policy in which the cache is bypassed 
and the load or store is performed to or from main memory. 


Cast-outs. Cache blocks that must be written to memory when a cache miss 
causes a cache block to be replaced. 


Changed bit. One of two page history bits found in each page table entry 
(PTE). The processor sets the changed bit if any store is performed 
into the page. See also Page access history bits and Referenced bit. 


Clear. To cause a bit or bit field to register a value of zero. See also Set. 


Completion. Completion occurs when an instruction has finished executing, 
written back any results, and is removed from the completion queue. 
When an instruction completes, it is guaranteed that this instruction 
and all previous instructions can cause no exceptions. 


Context synchronization. An operation that ensures that all instructions in 
execution complete past the point where they can produce an 
exception, that all instructions in execution complete in the context 
in which they began execution, and that all subsequent instructions 
are fetched and executed in the new context. Context synchronization 
may result from executing specific instructions (such as isync or rfi) 
or when certain events occur (such as an exception). 


Copy-back. An operation in which modified data in a cache block is copied 
back to memory. 
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Denormalized number. A nonzero floating-point number whose exponent 
has a reserved value, usually the format's minimum, and whose 
explicit or implicit leading significand bit is zero. 


Direct-mapped cache. A cache in which each main memory address can 
appear in only one location within the cache, operates more quickly 
when the memory request is a cache hit. 





Effective address (EA). The 32- or 64-bit address specified for a load, store, 
or an instruction fetch. This address is then submitted to the MMU 
for translation to either a physical memory address or an I/O address. 


Exception. A condition encountered by the processor that requires special, 
supervisor-level processing. 


Exception handler. A software routine that executes when an exception is 
taken. Normally, the exception handler corrects the condition that 
caused the exception, or performs some other meaningful task (that 
may include aborting the program that caused the exception). The 
address for each exception handler is identified by an exception 
vector offset defined by the architecture and a prefix selected via the 
MSR. 


Exclusive state. MEI state (E) in which only one caching device contains 
data that is also in system memory. 


Execution synchronization. A mechanism by which all instructions in 
execution are architecturally complete before beginning execution 
(appearing to begin execution) of the next instruction. Similar to 
context synchronization but doesn't force the contents of the 
instruction buffers to be deleted and refetched. 


Exponent. In the binary representation of a floating-point number, the 
exponent is the component that normally signifies the integer power 
to which the value two is raised in determining the value of the 
represented number. See also Biased exponent. 





Fall-through (branch fall-through). A not-taken branch. On the MPC750, 
fall-through branch instructions are removed from the instruction 
stream at dispatch. That is, these instructions are allowed to fall 
through the instruction queue via the dispatch mechanism, without 
either being passed to an execution unit and or given a position in the 
completion queue. 
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Fetch. Retrieving instructions from either the cache or main memory and 
placing them into the instruction queue. 


Floating-point register (FPR). Any of the 32 registers in the 
floating-point register file. These registers provide the source 
operands and destination results for floating-point instructions. Load 
instructions move data from memory to FPRs and store instructions 
move data from FPRs to memory. The FPRs are 64 bits wide and 
store floating-point values in double-precision format 


Flush. An operation that causes a modified cache block to be invalidated and 
the data to be written to memory. 


Fraction. In the binary representation of a floating-point number, the field of 
the significand that lies to the right of its implied binary point. 





General-purpose register (GPR). Any of the 32 registers in the 
general-purpose register file. These registers provide the source 
operands and destination results for all integer data manipulation 
instructions. Integer load instructions move data from memory to 
GPRs and store instructions move data from GPRs to memory. 


Guarded. The guarded attribute pertains to out-of-order execution. When a 
page is designated as guarded, instructions and data cannot be 
accessed out-of-order. 


Harvard architecture. An architectural model featuring separate caches for 
instruction and data. 


Hashing. An algorithm used in the page table search process. 





IEEE 754. A standard written by the Institute of Electrical and Electronics 
Engineers that defines operations and representations of binary 
floating-point numbers. 


Illegal instructions. A class of instructions that are not implemented for a 
particular processor that uses the PowerPC architecture. These 
include instructions not defined by the PowerPC architecture. In 
addition, for 32-bit implementations, instructions that are defined 
only for 64-bit implementations are considered to be illegal 
instructions. For 64-bit implementations instructions that are defined 
only for 32-bit implementations are considered to be illegal 
instructions. 
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Implementation. A particular processor that conforms to the PowerPC 
architecture, but may differ from other architecture-compliant 
implementations for example in design, feature set, and 
implementation of optional features. The PowerPC architecture has 
many different implementations. 


Imprecise exception. A type of synchronous exception that is allowed not to 
adhere to the precise exception model (see Precise exception). The 
PowerPC architecture allows only floating-point exceptions to be 
handled imprecisely. 


Instruction queue. A holding place for instructions fetched from the current 
instruction stream. 


Integer unit. A functional unit in the MPC750 responsible for executing 
integer instructions. 


In-order. An aspect of an operation that adheres to a sequential model. An 
operation is said to be performed in-order if, at the time that it is 
performed, it is known to be required by the sequential execution 
model. See Out-of-order. 


Instruction latency. The total number of clock cycles necessary to execute 
an instruction and make ready the results of that instruction. 


Interrupt. An asynchronous exception. On processors that use the PowerPC 
architecture, interrupts are a special case of exceptions. See also 
asynchronous exception. 


Invalid state. State of a cache entry that does not currently contain a valid 
copy of a cache block from memory. 





Key bits. A set of key bits referred to as Ks and Kp in each segment register 
and each BAT register. The key bits determine whether supervisor or 
user programs can access a page within that segment or block. 


Kill. An operation that causes a cache block to be invalidated. 





L2 cache. See Secondary cache. 


Least-significant bit (Isb). The bit of least value in an address, register, data 
element, or instruction encoding. 


Least-significant byte (LSB). The byte of least value in an address, register, 
data element, or instruction encoding. 
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Little-endian. A byte-ordering method in memory where the address n of a 
word corresponds to the least-significant byte. In an addressed 
memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3 
being the most-significant byte. See Big-endian. 





MESI (modified/exclusive/shared/invalid). Cache coherency protocol used 
to manage caches on different devices that share a memory system. 
Note that the PowerPC architecture does not specify the 
implementation of a MESI protocol to ensure cache coherency. 


Memory access ordering. The specific order in which the processor 
performs load and store memory accesses and the order in which 
those accesses complete. 


Memory-mapped accesses. Accesses whose addresses use the page or block 
address translation mechanisms provided by the MMU and that 
occur externally with the bus protocol defined for memory. 


Memory coherency. An aspect of caching in which it is ensured that an 
accurate view of memory is provided to all devices that share system 
memory. 


Memory consistency. Refers to agreement of levels of memory with respect 
to a single processor and system memory (for example, on-chip 
cache, secondary cache, and system memory). 


Memory management unit (MMU). The functional unit that is capable of 
translating an effective (logical) address to a physical address, 
providing protection mechanisms, and defining caching methods. 


Modified state. MEI state (M) in which one, and only one, caching device 
has the valid data for that address. The data at this address in external 
memory is not valid. See MESI. 


Most-significant bit (msb). The highest-order bit in an address, registers, 
data element, or instruction encoding. 


Most-significant byte (MSB). The highest-order byte in an address, 
registers, data element, or instruction encoding. 





NaN. An abbreviation for not a number; a symbolic entity encoded in 
floating-point format. There are two types of NaNs—signaling NaNs 
and quiet NaNs. 
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No-op. No-operation. A single-cycle operation that does not affect registers 
or generate bus activity. 


Normalization. A process by which a floating-point value is manipulated 
such that it can be represented in the format for the appropriate 
precision (single- or double-precision). For a floating-point value to 
be representable in the single- or double-precision format, the 
leading implied bit must be a 1. 





OEA (operating environment architecture). The level of the architecture 
that describes PowerPC memory management model, 
supervisor-level registers, synchronization requirements, and the 
exception model. It also defines the time-base feature from a 
supervisor-level perspective. Implementations that conform to the 
PowerPC OEA also conform to the PowerPC UISA and VEA. 


Optional. A feature, such as an instruction, a register, or an exception, that is 
defined by the PowerPC architecture but not required to be 
implemented. 


Out-of-order. An aspect of an operation that allows it to be performed ahead 
of one that may have preceded it in the sequential model, for 
example, speculative operations. An operation is said to be 
performed out-of-order if, at the time that it is performed, it is not 
known to be required by the sequential execution model. See 
In-order. 


Out-of-order execution. A technique that allows instructions to be issued 
and completed in an order that differs from their sequence in the 
instruction stream. 


Overflow. An condition that occurs during arithmetic operations when the 
result cannot be stored accurately in the destination register(s). For 
example, if two 32-bit numbers are multiplied, the result may not be 
representable in 32 bits. 





Packet. A term used in the MPC750 with respect to direct-store operations. 


Page. A region in memory. The OEA defines a page as a 4-Kbyte area of 
memory, aligned on a 4-Kbyte boundary. 


Page access history bits. The changed and referenced bits in the PTE keep 
track of the access history within the page. The referenced bit is set 
by the MMU whenever the page is accessed for a read or write 
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operation. The changed bit is set when the page is stored into. See 
Changed bit and Referenced bit. 


Page fault. A page fault is a condition that occurs when the processor 
attempts to access a memory location that does not reside within a 
page not currently resident in physical memory. On processors that 
use the PowerPC architecture, a page fault exception condition 
occurs when a matching, valid page table entry (PTE[V] = 1) cannot 
be located. 


Page table. A table in memory is comprised of page table entries, or PTEs. 
It is further organized into eight PTEs per PTEG (page table entry 
group). The number of PTEGs in the page table depends on the size 
of the page table (as specified in the SDR1 register). 


Page table entry (PTE). Data structures containing information used to 
translate effective address to physical address on a 4-Kbyte page 
basis. A PTE consists of 8 bytes of information in a 32-bit processor 
and 16 bytes of information in a 64-bit processor. 


Physical memory. The actual memory that can be accessed through the 
system’s memory bus. 


Pipelining. A technique that breaks operations, such as instruction 
processing or bus transactions, into smaller distinct stages or tenures 
(respectively) so that a subsequent operation can begin before the 
previous one has completed. 


Precise exceptions. A category of exception for which the pipeline can be 
stopped so instructions that preceded the faulting instruction can 
complete, and subsequent instructions can be flushed and 
redispatched after exception handling has completed. See Imprecise 
exceptions. 


Primary opcode. The most-significant 6 bits (bits 0-5) of the instruction 
encoding that identifies the type of instruction. See Secondary 
opcode. 


Protection boundary. A boundary between protection domains. 


Protection domain. A protection domain is a segment, a virtual page, a BAT 
area, or a range of unmapped effective addresses. It is defined only 
when the appropriate relocate bit in the MSR (IR or DR) is 1. 
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Quiesce. To come to rest. The processor is said to quiesce when an exception 
is taken or a sync instruction is executed. The instruction stream is 
stopped at the decode stage and executing instructions are allowed to 
complete to create a controlled context for instructions that may be 
affected by out-of-order, parallel execution. See Context 
synchronization. 


Quiet NaN. A type of NaN that can propagate through most arithmetic 
operations without signaling exceptions. A quiet NaN is used to 
represent the results of certain invalid operations, such as invalid 
arithmetic operations on infinities or on NaNs, when invalid. See 
Signaling NaN. 





rA. The rA instruction field is used to specify a GPR to be used as a source 
or destination. 


rB. The rB instruction field is used to specify a GPR to be used as a source. 


rD. The rD instruction field is used to specify a GPR to be used as a 
destination. 


rS. The rS instruction field is used to specify a GPR to be used as a source. 


Real address mode. An MMU mode when no address translation is 
performed and the effective address specified is the same as the 
physical address. The processor’s MMU is operating in real address 
mode if its ability to perform address translation has been disabled 
through the MSR registers IR and/or DR bits. 


Record bit. Bit 31 (or the Rc bit) in the instruction encoding. When it is set, 
updates the condition register (CR) to reflect the result of the 
operation. 


Referenced bit. One of two page history bits found in each page table entry 
(PTE). The processor sets the referenced bit whenever the page is 
accessed for a read or write. See also Page access history bits. 


Register indirect addressing. A form of addressing that specifies one GPR 
that contains the address for the load or store. 


Register indirect with immediate index addressing. A form of addressing 
that specifies an immediate value to be added to the contents of a 
specified GPR to form the target address for the load or store. 
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Register indirect with index addressing. A form of addressing that 
specifies that the contents of two GPRs be added together to yield the 
target address for the load or store. 


Reservation. The processor establishes a reservation on a cache block of 
memory space when it executes an Iwarx instruction to read a 
memory semaphore into a GPR. 


RISC (reduced instruction set computing). An architecture characterized 
by fixed-length instructions with nonoverlapping functionality and 
by a separate set of load and store instructions that perform memory 
accesses. 


Secondary cache. A cache memory that is typically larger and has a longer 
access time than the primary cache. A secondary cache may be 
shared by multiple devices. Also referred to as L2, or level-2, cache. 


Set (v). To write a nonzero value to a bit or bit field; the opposite of clear. The 
term ‘set’ may also be used to generally describe the updating of a 
bit or bit field. 


Set (n). A subdivision of a cache. Cacheable data can be stored in a given 
location in any one of the sets, typically corresponding to its 
lower-order address bits. Because several memory locations can map 
to the same location, cached data is typically placed in the set whose 
cache block corresponding to that address was used least recently. 
See Set-associative. 


Set-associative. Aspect of cache organization in which the cache space is 
divided into sections, called sets. The cache controller associates a 
particular main memory address with the contents of a particular set, 
or region, within the cache. 


Signaling NaN. A type of NaN that generates an invalid operation program 
exception when it is specified as arithmetic operands. See Quiet 
NaN. 


Significand. The component of a binary floating-point number that consists 
of an explicit or implicit leading bit to the left of its implied binary 
point and a fraction field to the right. 


Simplified mnemonics. Assembler mnemonics that represent a more 
complex form of a common operation. 
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Slave. The device addressed by a master device. The slave is identified in the 
address tenure and is responsible for supplying or latching the 
requested data for the master during the data tenure. 


Snooping. Monitoring addresses driven by a bus master to detect the need for 
coherency actions. 


Snoop push. Write-backs due to a snoop hit. The block will transition to an 
invalid or exclusive state. 


Split-transaction. A transaction with independent request and response 
tenures. 


Split-transaction bus. A bus that allows address and data transactions from 
different processors to occur independently. 


Static branch prediction. Mechanism by which software (for example, 
compilers) can hint to the machine hardware about the direction a 
branch is likely to take. 


Superscalar machine. A machine that can issue multiple instructions 
concurrently from a conventional linear instruction stream. 


Supervisor mode. The privileged operation state of a processor. In 
supervisor mode, software, typically the operating system, can 
access all control registers and can access the supervisor memory 
space, among other privileged operations. 


Synchronization. A process to ensure that operations occur strictly in order. 
See Context synchronization and Execution synchronization. 


Synchronous exception. An exception that is generated by the execution of 
a particular instruction or instruction sequence. There are two types 
of synchronous exceptions, precise and imprecise. 


System memory. The physical memory available to a processor. 





Tenure. A tenure consists of three phases: arbitration, transfer, termination. 
There can be separate address bus tenures and data bus tenures. 


TLB (translation lookaside buffer) A cache that holds recently-used page 
table entries. 


Throughput. The measure of the number of instructions that are processed 
per clock cycle. 
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Transaction. A complete exchange between two bus devices. A transaction 
is minimally comprised of an address tenure; one or more data 
tenures may be involved in the exchange. 


Transfer termination. Signal that refers to both signals that acknowledge 
the transfer of individual beats (of both single-beat transfer and 
individual beats of a burst transfer) and to signals that mark the end 
of the tenure. 





UISA (user instruction set architecture). The level of the architecture to 
which user-level software should conform. The UISA defines the 
base user-level instruction set, user-level registers, data types, 
floating-point memory conventions and exception model as seen by 
user programs, and the memory and programming models. 


Underflow. A condition that occurs during arithmetic operations when the 
result cannot be represented accurately in the destination register. 
For example, underflow can happen if two floating-point fractions 
are multiplied and the result requires a smaller exponent and/or 
mantissa than the single-precision format can provide. In other 
words, the result is too small to be represented accurately. 


User mode. The operating state of a processor used typically by application 
software. In user mode, software can access only certain control 
registers and can access only user memory space. No privileged 
operations can be performed. Also referred to as problem state. 





VEA (virtual environment architecture). The level of the architecture that 
describes the memory model for an environment in which multiple 
devices can access memory, defines aspects of the cache model, 
defines cache control instructions, and defines the time-base facility 
from a user-level perspective. Implementations that conform to the 
PowerPC VEA also adhere to the UISA, but may not necessarily 
adhere to the OFA. 


Virtual address. An intermediate address used in the translation of an 
effective address to a physical address. 


Virtual memory. The address space created using the memory management 
facilities of the processor. Program access to virtual memory is 
possible only when it coincides with physical memory. 





Word. A 32-bit data element. 
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Write-back. A cache memory update policy in which processor write cycles 
are directly written only to the cache. External memory is updated 
only indirectly, for example, when a modified cache block is cast out 
to make room for newer data. 


Write-through. A cache memory update policy in which all processor write 
cycles are written to both the cache and memory. 
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60x bus 
eieio instruction, C-72 
L2 cache flushing, C-72 
sync instruction, C-72 


A 


AACK (address acknowledge) signal, 7-14 
ABB (address bus busy) signal, 7-5, 8-8 
Address bus 
address tenure, 8-6 
address transfer 
An, 7-7 
APE, 8-13 
APn, 7-7 
address transfer attribute 
CI, 7-13 
GBL, 7-13 
TBST, 7-12, 8-14 
TSIZn, 7-11, 8-14 
TTn, 7-8, 8-13 
WT, 7-13 
address transfer start 
TS, 7-6, 8-12 
address transfer termination 
AACK, 7-14 
ARTRY, 7-14 
terminating address transfer, 8-17 
arbitration signals, 7-4, 8-7 
bus parking, 8-11 
Address bus pipelining, C-52 
Address translation, see Memory management unit 
Addressing modes, 2-35 
Aligned data transfer, 8-15, 8-17 
Aligned data transfers, C-54 
Alignment 
data transfers, 8-15 
exception, 4-18 
misaligned accesses, 2-29 
rules, 2-29 
An (address bus) signals, 7-7 
APE (address parity error) signal, 8-13 
APn (address parity) signals, 7-7 
Arbitration, system bus, 8-9, 8-19 
Arithmetic instructions 
floating-point, A-15 
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integer, A-13 
ARTRY (address retry) signal, 7-14 





B 


BG (bus grant) signal, 7-4, 8-7 
Block address translation, C-3 
block address translation flow, 5-11 
definition, 1-12 
registers 
description, 2-6 
initialization, 5-18 
selection of block address translation, 5-8 
Block diagram, C-5 
Boundedly undefined, definition, 2-33 
BR (bus request) signal, 7-4, 8-7 
Branch fall-through, 6-18 
Branch folding, 6-18 
Branch instructions 
address calculation, 2-54 
condition register logical, 2-55, A-19 
description, A-19 
list of instructions, 2-55, A-19 
system linkage, 2-56, 2-65, A-20 
trap, 2-55, A-20 
Branch prediction, 6-1, 6-22 
Branch processing unit 
branch instruction timing, 6-23 
execution timing, 6-18 
latency, branch instructions, 6-31 
overview, 1-9 
Branch processing unit (BPU) features list, C-6 
Branch resolution 
definition, 6-1 
resource requirements, 6-29 
BTIC (branch target instruction cache), 6-9 
Burst data transfers 
64-bit data bus, 8-15 
transfers with data delays, timing, 8-32 
Bus arbitration, see Data bus 
Bus configurations, 8-33 
Bus interface unit (BIU), 3-2, 3-30 
32-bit data bus mode, C-53 
address bus pipelining, C-52 
aligned data transfer, C-54 
burst ordering, C-54 
bus clocking, C-53 
BVSEL signal, C-51 
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D32 mode, selecting, C-56 
features list, C-8 
misaligned data transfers, C-55 
operation, C-51 
signal relationships, C-56 
voltages, C-52 
Bus transactions and L1 cache, 3-22 
BVSEL signal, C-51 
Byte ordering, 2-35 


Cc 


Cache 
bus interface unit, 3-2, 3-30 
cache arbitration, 6-11 
cache block push operations, C-72 
cache block, definition, 3-3 
cache characteristics, 3-1 
cache coherency 
description, 3-5 
memory/cache access attributes, 3-6 
overview, 3-25 
reaction to bus operations, 3-26 
cache control, 3-13 
cache control instructions 
bus operations, 3-23 
cache control, 3-13 
debi, 2-66 
debt, 2-63 
cache control instructions, effect on L2 cache, C-72 
cache hit, 6-11 
cache integration, 3-2 
cache locking 
address translation 
data cache locking, C-24 
instruction cache locking, C-28 
BAT examples, C-24 
data cache locking 
address translation, C-24 
disabling exceptions, C-24 
enabling, C-23 
entire cache locking, C-26 
invalidation, C-25 
invalidation (if locked), C-27 
loading, C-26 
locking, C-23 
way locking, C-27 
disabling exceptions 
data cache locking, C-24 
instruction cache locking, C-29 
enabling 
data cache, C-23 
instruction cache, C-28 
entire cache locking definition, C-21 
instruction cache locking 


address translation, C-28 
enabling, C-28 
entire cache locking, C-31 
invalidating instruction cache (if locked), C-32 
prefetching considerations, C-31 
preloading instructions, C-29 
way locking, C-31 
invalidation 
data cache, C-25 
data cache (if locked), C-27 
instruction cache (if locked), C-32 
loading 
data cache, C-26 
instruction cache preloading, C-29 
procedures, C-23 
register summary, C-22 
terminology, C-21 
way locking definition, C-21 
cache management instructions, A-20 
cache miss, 6-14 
cache operations 
cache block push operations, 9-4 
data cache transactions, 3-22 
instruction cache block fill, 3-21 
load/store operations, processor initiated, 3-10 
operations, 3-18 
overview, 3-1, 8-2 
snoop response to bus transactions, 3-26 
cache unit overview, 3-3 
cache-inhibited accesses (I bit), 3-6 
data cache configuration, 3-3 
data cache operation, C-19 
dcbf/dcbst execution, 9-4 
dcbi/dcbz execution, C-72 
differences from MPC750, C-71 
features list, C-7 
icbi, 9-4 
instruction cache configuration, 3-4 
instruction cache operation, C-19 
instruction cache throttling, 10-10 
L1 cache and bus transactions, 3-22 
L1 interface 
cache coherency, C-20 
cache-block-push operations, C-72 
coherency paradoxes, C-20 
coherency precautions, C-20 
dcbz instruction, C-16, C-21 
icbi instruction, C-72 
operation, C-19 
L2 
dcbi instruction, C-72 
stwex. instruction, C-72, C-72 
L2 interface 
access priorities, C-61 
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cache configuration, 9-2 
cache control, C-60 
cache control instructions, C-73 
cache control instructions, effect, C-72 
cache global invalidation, 9-7 
cache initialization, 9-6 
cache testing, 9-9 
clock configuration, 9-10 
clocking, C-64 
coherency, C-62 
debf instruction 
when private memory is used, C-73 
debi, 9-4 
debi instruction 
when private memory is used, C-73 
debst instruction 
when private memory is used, C-73 
debz instruction 
when private memory is used, C-73 
disabling the cache, C-70 
eciwx instruction 
when private memory is used, C-73 
ecowx instruction 
when private memory is used, C-73 
effect of cache control instructions, C-72 
eieio, 9-4, C-72 
eieio instruction 
when private memory is used, C-73 
enabling the cache, C-70 
features list, C-8 
flushing the cache, C-72 
global invalidation restriction, C-71 
icbi instruction 
when private memory is used, C-73 
L2 cache considerations, 6-15 
L2 cache interface signals, 7-26 
L2ADDR signal, C-69 
L2CR register, C-65 
L2DP signal, C-69 
L2PM register, C-13, C-68 
L2VSEL signal, C-51 
operation, 9-2, C-61 
organization, C-59 
overview, 9-1, C-58 
PB2 SRAM, C-62 
PB3 SRAMs, C-62 
pipelined burst SRAMs, C-76 
private memory operation 
effect of cache control instructions, C-73 
overview, C-60, C-62 
SRAM timing, C-78 
programming considerations, C-70 
registers, C-65 
services, C-61 
single-beat accesses, C-62 
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SRAM timing examples, 9-10, C-76 
stwex. execution, 9-3, C-72 
stwex. instruction 
when private memory is used, C-73 
sync, 9-4, C-72 
sync instruction 
when private memory is used, C-73 
testing, C-73 
tlbie instruction 
when private memory is used, C-73 
tlbsync instruction 
when private memory is used, C-73 
WIMG bits, C-62 
load/store operations, processor initiated, 3-10 
MEI cache coherency protocol, C-3 
overview, C-19 
PLRU replacement, 3-19 
software table search operations (optional), C-3 
stwex. execution, 9-3, C-72 
Changed (C) bit maintenance recording, 5-10, 5-21 
Changes from the MPC750, C-2 
Checkstop 
signal, 7-23, 8-35 
state, 4-16 
Cl (cache inhibit) signal, 7-13 
CKSTP_IN/CKSTP_OUT, 7-23 
Classes of instructions, 2-33 
Clean block operation, 3-26 
CLK_OUT signal, 7-31 
Clock signals 
PLL_CFGan, 7-31 
SYSCLK, 7-30 
Clocks 
bus clocking, C-53 
L2 clocking, C-64 
Compare instructions 
floating-point, A-16 
integer, A-14 
Completion 
completion unit resource requirements, 6-30 
considerations, 6-16 
definition, 6-1 
Completion unit, C-7 
Context synchronization, 2-36 
Conventions, Xxxv, Xxxix, 6-1 
COP/scan interface, 8-36 
Copy-back mode, 6-27 
CR (condition register) 
CR logical instructions, 2-55, A-19 
CR, description, 2-4 
CTR register, 2-4 
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D 


DABR (data address breakpoint register), 2-8 
DAR (data address register), 2-6 


Data block address translation (DBAT) registers, C-12 


Data bus 
arbitration signals, 7-16, 8-8 
bus arbitration, 8-19 
data tenure, 8-7 
data transfer, 7-17, 8-21 
data transfer termination, 7-20, 8-22 
Data cache 
block push operation, 3-22 
configuration, 3-3 
DCFI, DCE, DLOCK bits, 3-13 
organization, 3-4 
Data organization in memory, 2-29 
Data TLB compare (DCMP) register, C-12, C-38 
Data TLB miss (DMISS) register, C-12 
Data TLB miss address (DMISS) register, C-37 
Data TLB miss for load exception, C-33, C-34 
Data TLB miss for store exception, C-33, C-35 
Data transfers 
alignment, 8-15 
burst ordering, 8-15 
eciwx and ecowx instructions, alignment, 8-17 
operand conventions, 2-29 
signals, 8-21 
DBB (data bus busy) signal, 7-17, 8-8, 8-20 
DBDIS (data bus disable) signal, 7-19 
DBG (data bus grant) signal, 7-16, 8-8 
DBWO (data bus write only) signal, 
7-16, 8-8, 8-21, 8-37 
debi, 2-66 
debt, 2-63 
DEC (decrementer register), 2-7 
Decrementer exception, 4-19 
Defined instruction class, 2-33 
DHn/DLn (data bus) signals, 7-18 
Dispatch 
considerations, 6-16 
dispatch unit resource requirements, 6-30 
DPn (data bus parity) signals, 7-19 
DRTRY (data retry) signal, 7-21, 8-22, 8-25 
DSI exception, 4-17 
DSISR register, 2-7 
DTLB organization, 5-23 
Dynamic branch prediction, 6-9 





E 


EAR (external access register), 2-8 

Effective address calculation 
address translation, 5-3 
branches, 2-36 
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loads and stores, 2-36, 2-46, 2-51 
eieio, 2-62 
EMI protocol, enforcing memory coherency, 8-26 
Enveloped high-priority cache block push 
operation, 3-22 
Error termination, 8-26 
Event counting, 11-10 
Event selection, 11-11 
Exceptions 
alignment exception, 4-18 
decrementer exception, 4-19 
definitions, 4-12 
differences from MPC750, C-32 
DSI exception, 4-17 
enabling and disabling exceptions, 4-10 
exception classes, 4-2 
exception handler code, C-44 
exception handler flow, C-40 
exception prefix (IP) bit, 4-13 
exception priorities, 4-4 
exception processing, 4-7, 4-10 
external interrupt, 4-17 
FP assist exception, 4-20 
FP unavailable exception, 4-19 
instruction TLB miss, C-34 
instruction-related exceptions, 2-37 
ISI exception, 4-17 
machine check exception, 4-14 
MPC755-specific 
data TLB miss for load exception, C-33, C-34 
data TLB miss for store exception, C-33, C-35 
instruction TLB miss exception, C-33, C-34 
performance monitor interrupt, 4-20 
program exception, 4-18 
register settings 
MSR, 4-8, 4-12 
SRRO/SRR1I, 4-7 
reset exception, 4-13 
returning from an exception handler, 4-11 
summary table, 4-3 
system call exception, 4-19 
system management interrupt, 4-22 
terminology, 4-2 
thermal management interrupt exception, 4-23 
Execution synchronization, 2-37 
Execution unit timing examples, 6-18 
Execution units, 1-10, C-6 
External control instructions, 2-65, 8-17 


F 


Features, list, 1-4, C-6 

Finish cycle, definition, 6-2 

Floating-point model 
FEO/FE1 bits, 4-10 
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FP arithmetic instructions, 2-42, A-15 
FP assist exceptions, 4-20 
FP compare instructions, 2-44, A-16 
FP load instructions, A-18 
FP move instructions, A-19 
FP multiply-add instructions, 2-43, A-16 
FP operand, 2-30 
FP rounding/conversion instructions, 2-43, A-16 
FP store instructions, 2-53, A-19 
FP unavailable exception, 4-19 
FPSCR instructions, 2-44, A-16 
TEEE-754 compatibility, 2-28 
NI bit in FPSCR, 2-30 
Floating-point unit 
execution timing, 6-24 
latency, FP instructions, 6-34 
overview, 1-10, 1-11 
Floating-point unit (FPU), C-6 
Flush block operation, 3-26 
FPRz (floating-point registers), 2-4 
FPSCR (floating-point status and control register) 
FPSCR instructions, 2-44, A-16 
FPSCR register description, 2-4 
NI bit, 2-30 
Functional additions, MPC755 vs. MPC750, C-2 
Functional description, MPC755, C-3 


G 


GBL (global) signal, 7-13 
GPRn (general-purpose registers), 2-4 
Guarded memory bit (G bit), 3-6 


H 


Hardware implementation-dependent register 2 
(HID2), C-12, C-15 
HIDn (hardware implementation-dependent) registers 
HIDO 
description, 2-10 
doze bit, 10-3 
DPM enable bit, 10-2 
nap bit, 10-4 
HID1 
description, 2-14 
PLL configuration, 2-14, 7-31 
HRESET (hard reset) signal, 7-24, 8-35 


TABR (instruction address breakpoint register), 2-9 

ICTC (instruction cache throttling control) 
register, 2-22, 10-11 

TEEE 1149.1-compliant interface, 8-36 

Illegal instruction class, 2-34 


Instruction block address translation (IBAT) 
registers, C-12 
Instruction cache 
configuration, 3-4 
instruction cache block fill operations, 3-21 
organization, 3-5 
Instruction cache throttling, 10-10 
Instruction timing 
examples 
cache hit, 6-12 
cache miss, 6-15 
execution unit, 6-18 
instruction flow, 6-8 
memory performance considerations, 6-27 
overview, 6-3 
terminology, 6-1 
Instruction TLB compare (ICMP) register, C-12, C-38 
Instruction TLB miss (IMISS) register, C-12 
Instruction TLB miss address (IMISS) register, C-37 
Instruction TLB miss exception, C-33, C-34 
Instructions 
branch address calculation, 2-54 
branch instructions, 6-8, 6-18, 6-20, A-19 
cache control instructions, 9-4 
cache management instructions, A-20 
classes, 2-33 
condition register logical, 2-55, A-19 
defined instructions, 2-33 
external control instructions, 2-65 
floating-point 
arithmetic, 2-42, A-15 
compare, 2-44, A-16 
FP load instructions, A-18 
FP move instructions, A-19 
FP rounding and conversion, 2-43, A-16 
FP status and control register, 2-44 
FP store instructions, A-19 
FPSCR instructions, A-16 
multiply-add, 2-43, A-16 
illegal instructions, 2-34 
instruction cache throttling, 10-10 
instruction flow diagram, 6-10 
instruction serialization, 6-17 
instruction serialization types, 6-17 
instruction set summary, 2-31 
instruction use, MPC750, C-16 
instruction use, MPC755, C-16 
instructions not implemented, B-1 
integer 
arithmetic, 2-38, A-13 
compare, 2-40, A-14 
load, A-17 
load/store multiple, A-18 
load/store string, A-18 
load/store with byte reverse, A-18 
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logical, 2-40, A-14 
rotate and shift, 2-41, A-15 
store, A-17 
integer instructions, 6-32 
isyne, 4-12 
isync instruction restriction, C-16 
L2 cache 
debf instruction 
when private memory is used, C-73 
debi instruction, C-72 
when private memory is used, C-73 
debst instruction 
when private memory is used, C-73 
debz instruction 
when private memory is used, C-73 
eciwx instruction 
when private memory is used, C-73 
ecowx instruction 
when private memory is used, C-73 
eieio instruction 
when private memory is used, C-73 
icbi instruction 
when private memory is used, C-73 
stwex. instruction 
hits a modified sector, C-72 
hits an unmodified sector, C-72 
when private memory is used, C-73 
sync instruction 
when private memory is used, C-73 
tlbie instruction 
when private memory is used, C-73 
tlbsync instruction 
when private memory is used, C-73 
latency summary, 6-31 
load and store 
address generation 
floating-point, 2-51 
integer, 2-46 
byte reverse instructions, 2-49, A-18 
floating-point load, A-18 
floating-point move, 2-45, A-19 
floating-point store, 2-52 
handling misalignment, 2-45 
integer load, 2-46, A-17 
integer multiple, 2-49 
integer store, 2-48, A-17 
memory synchronization, 2-59, 2-61, A-18 
multiple instructions, A-18 
string instructions, 2-50, A-18 
lookaside buffer management instructions, A-21 
memory control instructions, 2-62, 2-66 
memory synchronization instructions, 
2-59, 2-61, A-18 
MPC750 and MPC755 
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dcbz instruction, C-16, C-21 
MPC750 instruction use, C-16 
MPC755 instruction use, C-16 
mtsr/mtsrin instruction restriction, C-16 
PowerPC instructions set, list, A-1 
PowerPC instructions, list, A-7, A-13 
processor control instructions, 

2-56, 2-60, 2-65, A-20 

reserved instructions, 2-35 

restrictions, C-16 

rfi, 4-11 

segment register manipulation instructions, A-21 

stfd instruction, C-16, D-4 

stwex., 4-12 

support for lwarx/stwex., 8-36 

sync, 4-12 

system linkage instructions, 2-56, A-20 

TLB management instructions, A-21 

tlbie, 2-67 

tlbld, C-17, C-18 

tlbli, C-17, C-19 

tlbsync, 2-67 

trap instructions, 2-55, A-20 
INT (interrupt) signal, 7-22, 8-34 
Integer arithmetic instructions, 2-38, A-13 
Integer compare instructions, 2-40, A-14 
Integer load instructions, 2-46, A-17 
Integer logical instructions, 2-40, A-14 
Integer rotate/shift instructions, 2-41, A-15 
Integer store gathering, 6-26 
Integer store instructions, 2-48, A-17 
Integer unit (IU), C-6 
Integer unit execution timing, 6-24 
Interrupt, external, 4-17 
ISI exception, 4-17 
isyne, 2-62, 4-12 
isync instruction restriction, C-16 
ITLB organization, 5-23 


K 
Kill block operation, 3-26 


L 


L1/L2 interface operation, see Cache 

L2 cache interface operation, see Cache 

L2 private memory control (L2PM) register, C-13 
L2ADDR (L2 address) signal, C-69 

L2ADDRn (L2 address) signals, 7-26 

L2CE (L2 chip enable) signals, 7-28 
L2CLK_OUTA (L2 clock out A) signal, 7-28 
L2CLK_OUTB (L2 clock out B) signal, 7-28 
L2CR (L2 cache control register), 2-25, 9-4 
L2DATAn (L2 data) signals, 7-27 
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L2DP (L2 data parity) signal, C-69 
L2DPn (L2 data parity) signals, 7-27 
L2PM (L2 private memory) control register, C-68 
L2SYNC_IN (L2 sync in) signal, 7-29 
L2SYNC_OUT (L2 sync out) signal, 7-29 
L2VSEL signal, C-51 
L2WE (L2 write enable) signal, 7-28 
L2ZZ (L2 low-power mode enable) signal, 7-29 
Latency 
load/store instructions, 6-35 
Latency, definition, 6-2 
Load/store 
address generation, 2-46 
byte reverse instructions, 2-49, A-18 
execution timing, 6-25 
floating-point load instructions, 2-52, A-18 
floating-point move instructions, 2-45, A-19 
floating-point store instructions, 2-52, A-19 
handling misalignment, 2-45 
integer load instructions, 2-46, A-17 
integer store instructions, 2-48, A-17 
latency, load/store instructions, 6-35 
load/store multiple instructions, 2-49, A-18 
memory synchronization instructions, A-18 
string instructions, 2-50, A-18 
Load/store unit (LSU), C-6 
Logical address translation, 5-1 
Logical instructions, integer, A-14 
Lookaside buffer management instructions, A-21 
LR (link register), 2-4 
lwarx/stwcx. support, 8-36 


Machine check exception, 4-14 
MCP (machine check interrupt) signal, 7-22 
MEI protocol 
hardware considerations, 3-9 
read operations, 3-23 
state transitions, 3-31 
Memory accesses, 8-4 
data transfers, C-54 
Memory coherency bit (M bit) 
cache interactions, 3-6 
timing considerations, 6-27 
Memory control instructions 
description, 2-62, 2-66 
segment register manipulation, A-21 
Memory management unit 
address translation flow, 5-11 
address translation mechanisms, 5-7, 5-11 
block address translation, 5-8, 5-11, 5-18 
block diagrams 
32-bit implementations, 5-5 
DMMU, 5-7 





INDEX 


IMMU, 5-6 
exceptions summary, 5-14 
features summary, 5-3 
implementation-specific features, 5-2 
instructions and registers, 5-16 
memory protection, 5-9 
overview, 1-12, 5-2 
page address translation, 5-8, 5-11, 5-26 
page history status, 5-10, 5-19-5-22 
real addressing mode, 5-11, 5-18 
segment model, 5-19 
Memory management unit (MMU) 
DCMP register, C-38 
DMISS register, C-37 
exception handler code, C-44 
exception handler flow, C-40 
features list, C-8 
HASH1/HASH2 registers, C-38 
ICMP register, C-38 
IMISS register, C-37 
MPC755 features, C-35 
software table search operation 
overview, C-39 
registers, C-12, C-37 
resources, C-36 
support, C-3 
tlbld/tlbli instructions, C-17 
Memory synchronization instructions, 
2-59, 2-61, A-18 
Misaligned data transfers, C-55 
Misalignment 
misaligned accesses, 2-29 
misaligned data transfer, 8-17 
MMCRazu (monitor mode control registers), 
2-15, 4-20, 11-3 
Modes 
32-bit data bus mode, C-53 
D32 mode, C-56 
MPC745 
features not supported, C-8 
overview, C-1 
MPC750 
address bus pipelining, C-52 
changes in MPC755, C-2 
differences from MPC755 
exceptions, C-32 
programming model, C-10 
thermal management, C-78 
instruction use, C-16 
isync instruction restriction, C-16 
mtsr/mtsrin instruction restriction, C-16 
pipelined burst SRAMs, C-76 
stfd instruction, C-16, D-4 
MPC755 
32-bit data bus mode, C-53 
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address bus pipelining, C-52 
block address translation, C-3 
block diagram, C-5 
cache locking, C-21 
core voltage, C-52 
data TLB miss for load exception, C-33, C-34 
data TLB miss for store exception, C-33, C-35 
exceptions, C-32 
features list, C-6 
functional description, C-3 
I/O signal voltage, C-52 
implementation-specific registers, C-12 
instruction cache prefetching considerations, C-31 
instruction TLB miss exception, C-33, C-34 
instruction use, C-16 
isync instruction restriction, C-16 
LI cache operation, C-19 
memory management unit (MMU), C-35 
mtsr/mtsrin instruction restriction, C-16 
pipelined burst SRAMs, C-76 
programming model, C-10 
PTEG registers, HASH1/HASH2, C-13 
software table search operation (optional), C-3 
stfd instruction, C-16, D-4 
MSR (machine state register) 
bit settings, 4-8 
FEO/FE1 bits, 4-10 
IP bit, 4-13 
PM bit, 2-5 
RI bit, 4-11 
settings due to exception, 4-12 
mtsr/mtsrin instructions restriction, C-16 
Multiple-precision shifts, 2-42 
Multiply-add instructions, A-16 
Multiprocessing support, C-9 


N 
No-DRTRY mode, 8-33 


O 


OEA 
exception mechanism, 4-1 
memory management specifications, 5-1 
registers, 2-5 
Operand conventions, 2-28 
Operand placement and performance, 6-25 
Operating environment architecture (OEA), xxxii 
Operations 
bus operations caused by cache control 
instructions, 3-23 
cache operations, 3-1 
data cache block push, 3-22 
enveloped high-priority cache block push, 3-22 


instruction cache block fill, 3-21 

read operation, 3-23 

response to snooped bus transactions, 3-26 

single-beat write operations, 8-29 
Optional instructions, A-31 
Overview, 1-1 

MPC745, C-1 

MPC755, C-2 


P 


Page address translation 
definition, 1-12 
page address translation flow, 5-26 
page size, 5-19 
selection of page address translation, 5-8, 5-13 
TLB organization, 5-24 
Page history status 
cases of dcbt and dcbtst misses, 5-20 
R and C bit recording, 5-10, 5-19—5-22 
Page table entry (PTE) 
DCMP register, C-12 
ICMP register, C-12 
Page table entry groups (PTEGs) 
HASH1/HASH2 registers, C-13 
Page table updates, 5-31 
Page tables 
resources for table search operations, C-36 
RPA register, C-13 
software table search operation, C-39 
software table search registers, C-37 
SPRG(4-7) registers, C-12 
Performance monitor, C-78 
event counting, 11-10 
event selecting, 11-11 
performance monitor interrupt, 4-20, 11-2 
performance monitor SPRs, 11-3 
purposes, 11-1 
registers, 11-3 
warnings, 11-12 
Phase-locked loop, 10-3 
Physical address generation, 5-1 
Pipeline 
instruction timing, definition, 6-2 
pipeline stages, 6-7 
pipelined execution unit, 6-4 
superscalar/pipeline diagram, 6-5 
Pipelined burst SRAMs, C-76 
PMCI1 and PMC72 registers, 1-25 
PMCn (performance monitor counter) registers, 
2-17, 4-20, 11-6 
Power and ground signals, 7-31 
Power management 
doze mode, 10-3 
doze, nap, sleep, DPM bits, 2-14 
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dynamic power management, 10-1 
features list, C-9 
full-power mode, 10-2 
nap mode, 10-3 
overview, C-78 
programmable power modes, 10-2 
sleep mode, 10-4 
software considerations, 10-5 
Power-on reset (POR) 
L2PM initialization, C-13 
PowerPC architecture 
instruction list, A-1, A-7, A-13 
operating environment architecture (OEA), xxxii 
user instruction set architecture (UISA), xxxi 
virtual environment architecture (VEA), xxxi 
Power-saving modes, C-4 
Primary hash address (HASH1) register, C-13, C-38 
Priorities, exception, 4-4 
Private memory SRAM, C-78 
Process switching, 4-12 
Processor control instructions, 2-56, 2-60, 2-65, A-20 
Program exception, 4-18 
Program order, definition, 6-2 
Programmable power states 
doze mode, 10-3 
full-power mode with DPM enabled/disabled, 10-2 
nap mode, 10-3 
sleep mode, 10-4 
Programming model, C-10 
Protection of memory areas 
no-execute protection, 5-12 
options available, 5-9 
protection violations, 5-14 
PVR (processor version register), 2-5, C-14 


Q 


QACK (quiescent acknowledge) signal, 7-25 
QREQ (quiescent request) signal, 7-25, 8-35 
Qualified bus grant, 8-7 

Qualified data bus grant, 8-20 





R 


Read operation, 3-26 
Read-atomic operation, 3-26 
Read-with-intent-to-modify operation, 3-26 
Real address (RA), see Physical address generation 
Real addressing mode (translation disabled) 
data accesses, 5-11, 5-18 
instruction accesses, 5-11, 5-18 
support for real addressing mode, 5-2 
Referenced (R) bit maintenance recording, 
5-10, 5-20, 5-29 
Registers 


cache locking register summary, C-22 
implementation-specific 
DBAT(4-7), C-12 
DCMP, C-12 
DMISS, C-12 
HASH(-2), C-13 
HID2, C-12, C-15 
IBAT(4-7), C-12 
ICMP, C-12 
ICTC, 2-22, 10-11 
IMISS, C-12 
L2CR, 2-25, 9-4, C-65 
L2PM, C-13, C-68 
MMCRO, 2-15, 4-20, 11-3 
MMCRI, 2-17, 4-20, 11-5 
RPA, C-13 
SIA, 2-21, 4-20 
SPRG(4-7), C-12 
THRMn», 2-22, 10-7 
UMMCRO, 2-16 
UMMCRI, 2-17 
UPMCna, 2-21 
USIA, 2-21 
MPC750 programming model, 2-3 
not implemented 
MSR, TGPR bit, C-12 
performance monitor registers, 2-14 
reset settings, 2-27 
SPR encodings, 2-58 
supervisor-level 
BAT registers, 2-6 
DABR, 2-8 
DAR, 2-6 
DCMP, C-38 
DEC, 2-7 
DMISS, C-37 
DSISR, 2-7 
EAR, 2-8 
HASH1/HASH2, C-38 
HIDO, 2-10, 10-2 
HID 1, 2-14 
HID2, C-12, C-15 
TABR, 2-9 
ICMP, C-38 
ICTC, 2-22, 10-11 
IMISS, C-37 
L2CR, 2-25, 9-4, C-65 
L2PM, C-13, C-68 
MMCRO, 2-15, 4-20, 11-3 
MMCRI, 2-17, 4-20, 11-5 
MSR, 2-5 
PMC1 and PMC2, 1-25 
PMCn, 2-17, 4-20 
PVR, 2-5, C-14 
SDR1, 2-6 
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SIA, 2-21, 4-20, 11-10 
SPRGn, 2-6 
SPRs for performance monitor, 11-1 
SRn, 2-6 
SRRO/SRRI, 2-7 
THRM)|, 2-22, 10-7 
time base (TB), 2-7 
user-level 
CR, 2-4 
CTR, 2-4 
FPRn, 2-4 
FPSCR, 2-4 
GPRn, 2-4 
LR, 2-4 
time base (TB), 2-5, 2-7 
UMMCRO, 2-16 
UMMCRI, 2-17 
UPMCn, 2-21 
USIA, 2-21, 11-10 
XER, 2-4 
Rename buffer, definition, 6-2 
Rename buffers, C-7 
Rename register operation, 6-17 
Required physical address (RPA) register, C-13 
Reservation station, definition, 6-2 
Reserved instruction class, 2-35 
Reset 
HRESET signal, 7-24, 8-35 
reset exception, 4-13 
SRESET signal, 7-24, 8-35 
Restrictions 
MPC750 
isync instruction, C-16 
MPC755 
isync instruction, C-16 
Retirement, definition, 6-2 
rfi, 4-11 
Rotate/shift instructions, 2-41, A-15 
RSRV (reserve) signal, 7-25, 8-36 


S 


SDR1 register, 2-6 
Secondary hash address (HASH2) register, 
C-13, C-38 
Segment registers 
SR description, 2-6 
SR manipulation instructions, 2-67, A-21 
Segmented memory model, 
see Memory management unit 
Serializing instructions, 6-17 
Shift/rotate instructions, 2-41, A-15 
SIA (sampled instruction address) register, 
2-21, 4-20, 11-10 
Signals 


INDEX 


32-bit data bus signal relationships, C-56 
AACK, 7-14 

ABB, 7-5, 8-8 

address arbitration, 7-4, 8-7 
address transfer, 8-12 

address transfer attribute, 8-13 
An, 7-7 

APn, 7-7 

ARTRY, 7-14, 8-22 

BG, 7-4, 8-7 

BR, 7-4, 8-7 

BVSEL, C-51 

checkstop, 8-35 

CI, 7-13 
CKSTP_IN/CKSTP_OUT, 7-23 
CLK_OUT, 7-31 
configuration, 7-2 

COP/scan interface, 8-36 

data arbitration, 8-8, 8-19 
data transfer termination, 8-22 
DBB, 7-17, 8-8, 8-20 
DBDIS, 7-19 

DBG, 7-16, 8-8 

DBWO, 7-16, 8-8, 8-21, 8-37 
DHn/DL», 7-18 

DPn, 7-19 

DRTRY, 7-21, 8-22, 8-25 
GBL, 7-13 

HRESET, 7-24 

INT, 7-22, 8-34 

L2 cache interface signals, 7-26 
L2ADDR, C-69 

L2ADDRza, 7-26 

L2CE, 7-28 

L2CLK_OUTA, 7-28 
L2CLK_OUTB, 7-28 
L2DATAn, 7-27 

L2DP, 7-27, C-69 
L2SYNC_IN, 7-29 
L2SYNC_OUT, 7-29 
L2VSEL, C-51 

L2WE, 7-28 

L2ZZ, 7-29 

MCP, 7-22 

MPC755-specific signals, C-51 
PLL_CFGan, 7-31 

power and ground signals, 7-31 
QACK, 7-25 

QREQ, 7-25, 8-35 

reset, 8-35 

RSRV, 7-25, 8-36 

SMI, 4-23, 7-22 

SRESET, 7-24, 8-35 

system quiesce control, 8-35 
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TA, 7-20 

TBEN, 7-26 

TBST, 7-12, 8-14, 8-21 
TEA, 7-21, 8-22, 8-26 
TLBISYNC, 7-26 
transfer encoding, 7-9 


TS, 7-6 
TSIZn, 7-11, 8-14 
TTn, 7-8, 8-13 
WT, 7-13 

single, C-3 


Single-beat transfer 
reads with data delays, timing, 8-30 
reads, timing, 8-28 
termination, 8-22 
writes, timing, 8-29 
SMI (system management interrupt) signal, 
4-23, 7-22 
Snooping, 3-25 
Software table search 
optional, C-3 
registers, C-12 
SPRG(4-7), C-12 
tlbld/tlbli instructions, C-17 
Special-purpose registers (SPRGn), C-12 
Split-bus transaction, 8-8 
SPRGn registers, 2-6 
SRESET (soft reset) signal, 7-24, 8-35 
SRRO/SRR1 (status save/restore registers) 
description, 2-7 
exception processing, 4-7 
key bit derivation (SRR1), C-34 
Stage, definition, 6-2 
Stall, definition, 6-3 
Static branch prediction, 6-9, 6-22 
stwex., 4-12 
Superscalar, definition, 6-3 
sync, 4-12 
SYNC operation, 3-26 
Synchronization 
context/execution synchronization, 2-36 
execution of rfi, 4-11 
memory synchronization instructions, 
2-59, 2-61, A-18 
SYSCLK (system clock) signal, 7-30 
System call exception, 4-19 


System interface, see Bus interface unit (BIU) 


System linkage instructions, 2-56, 2-65 
list of instructions, A-20 
System management interrupt, 4-22, 10-1 





System quiesce control signals (QACK/ QREQ), 8-35 


System register unit 
execution timing, 6-27 
latency, CR logical instructions, 6-32 
latency, system register instructions, 6-31 


INDEX 


System register unit (SRU), C-7 


T 


TA (transfer acknowledge) signal, 7-20 
Table search flow (primary and secondary), 5-29 
TBEN (time base enable) signal, 7-26 
TBL/TBU (time base lower and upper) registers, 
2-5, 2-7 
TBST (transfer burst) signal, 7-12, 8-14, 8-21 
TEA (transfer error acknowledge) signal, 7-21, 8-26 
Termination, 8-17, 8-22 
Thermal assist unit (TAU), 10-5 
Thermal management 
differences from MPC750, C-78 
features list, C-9 
Thermal management interrupt exception, 4-23 
THRMnh (thermal management) registers, 2-22, 10-7 
Throughput, definition, 6-3 
Timing considerations, 6-7 
Timing diagrams, interface 
address transfer signals, 8-12 
burst transfers with data delays, 8-32 
L2 cache SRAM timing, 9-10, C-76 
single-beat reads, 8-28 
single-beat reads with data delays, 8-30 
single-beat writes, 8-29 
single-beat writes with data delays, 8-31 
use of TEA, 8-32 
using DBWO, 8-37 
Timing, instruction 
BPU execution timing, 6-18 
branch timing example, 6-23 
cache hit, 6-12 
cache miss, 6-15 
execution unit, 6-18 
FPU execution timing, 6-24 
instruction dispatch, 6-16 
instruction flow, 6-8 
instruction scheduling guidelines, 6-28 
IU execution timing, 6-24 
latency summary, 6-31 
load/store unit execution timing, 6-25 
overview, 6-3 
SRU execution timing, 6-27 
stage, definition, 6-2 
TLB 
description, 5-23 
invalidate (tlbie instruction), 5-25, 5-31 
LRU replacement, 5-24 
organization for ITLB and DTLB, 5-23 
TLB miss and table search operation, 5-23, 5-27 
TLB invalidate 
description, 5-25 
TLB management instructions, 2-68, A-21 
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TLB miss, effect, 6-28 
tlbie, 2-67 
TLBISYNC (TLBI sync) signal, 7-26 
tibld instruction, C-18 
tibli instruction, C-19 
tlbsync, 2-67 
Transactions, data cache, 3-22 
Transfer, 8-12, 8-21 
Transfers 

aligned transfers, C-54 

misaligned transfers, C-54 
Trap instructions, 2-55 
TS (transfer start) signal, 7-6, 8-12 
TSIZn (transfer size) signals, 7-11, 8-14 
TTn (transfer type) signals, 7-8, 8-13 


U 


UMMCRO (user monitor mode control register 0), 
2-16, 11-5 
UMMCRI (user monitor mode control register 1), 
2-17, 11-6 
UPMCrn (user performance monitor counter) 
registers, 2-21, 11-9 
Use of TEA, timing, 8-32 
User instruction set architecture (UISA) 
registers, 2-2 
User instruction set architecture (UISA) 
description, Xxxi 
USIA (user sampled instruction address) register, 
2-21, 11-10 
Using DBWO, timing, 8-37 








V 


Virtual environment architecture (VEA), xxxi 


W 


WIMG bits, 8-26 
Write-back, definition, 6-3 
Write-through mode (W bit) 

cache interactions, 3-6 
Write-with-Atomic operation, 3-26 
Write-with-Flush operation, 3-26 
Write-with-Kill operation, 3-26 
WT (write-through) signal, 7-13 


X 
XER register, 2-4 
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